In this post “Find and Delete all duplicate rows but keep one”, we are going to discuss that how we can find and delete all the duplicate rows of a table except one row. Assume that we have a table named tbl_sample which has four columns – EmpId, EmpName, Age, and City. This table has some duplicate data (in all the four columns) which needs to be deleted except the original one row. To demonstrate this, let’s create the dummy table with some sample data.
Here is the code to create the dummy table with sample data:
IF OBJECT_ID('dbo.tbl_Sample') IS NOT NULL DROP TABLE dbo.tbl_Sample GO CREATE TABLE dbo.tbl_Sample ( EmpId INT, EmpName VARCHAR(256), Age TINYINT, City VARCHAR(50) ) GO INSERT INTO dbo.tbl_Sample (EmpId, EmpName, Age, City) VALUES('1', 'Smith', 30, 'New York'), ('1', 'Smith', 30, 'New York'), ('1', 'Smith', 30, 'New York'), ('2', 'Adam', 35, 'California'), ('2', 'Adam', 35, 'California'), ('3', 'John', 42, 'Chicago'), ('3', 'John', 42, 'Chicago'), ('3', 'John', 42, 'Chicago'), ('3', 'John', 42, 'Chicago'), ('4', 'Jose', 36, 'Boston'), ('4', 'Jose', 36, 'Boston'), ('4', 'Jose', 36, 'Boston'), ('4', 'Jose', 36, 'Boston'), ('5', 'Elgar', 28, 'Seattle') GO
Let’s have a look at the table data now.
SELECT * FROM dbo.tbl_Sample GO
In above table, we can see that we have multiple duplicate rows for EmpId 1, 2, 3 and 4. To find the duplicate records from the above table, we can use this query. This query will output all the duplicate rows from the table with the duplicate rows count.
SELECT * FROM dbo.tbl_Sample GROUP BY EmpId, EmpName, Age, City HAVING COUNT(1) > 1 GO
Now, we need to write a query to find and delete all the duplicate records from the table excluding the original one record. Here is the query.
WITH CTE AS ( SELECT *, ROW_NUMBER() OVER(PARTITION BY EmpId, EmpName, Age, City ORDER BY (SELECT NULL)) Seq FROM dbo.tbl_Sample ) DELETE FROM CTE WHERE Seq > 1 GO
In the above code, we are using a CTE (Common Table Expression) to find the duplicates and then we are deleting all the records using DELETE command but keeping the only one record for each employee.
After executing this query, we have only one row for each employee as this.
Thanks for the reading. Please share your inputs in the comments.