Advertisement
Advertisement


Finding duplicate values in a SQL table


Question

It's easy to find duplicates with one field:

SELECT name, COUNT(email) 
FROM users
GROUP BY email
HAVING COUNT(email) > 1

So if we have a table

ID   NAME   EMAIL
1    John   [email protected]
2    Sam    [email protected]
3    Tom    [email protected]
4    Bob    [email protected]
5    Tom    [email protected]

This query will give us John, Sam, Tom, Tom because they all have the same email.

However, what I want is to get duplicates with the same email and name.

That is, I want to get "Tom", "Tom".

The reason I need this: I made a mistake, and allowed to insert duplicate name and email values. Now I need to remove/change the duplicates, so I need to find them first.

2019/09/13
1
1976
9/13/2019 2:18:25 AM

Accepted Answer

SELECT
    name, email, COUNT(*)
FROM
    users
GROUP BY
    name, email
HAVING 
    COUNT(*) > 1

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

2018/06/01
3112
6/1/2018 11:59:42 AM

try this:

declare @YourTable table (id int, name varchar(10), email varchar(50))

INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')

SELECT
    name,email, COUNT(*) AS CountOf
    FROM @YourTable
    GROUP BY name,email
    HAVING COUNT(*)>1

OUTPUT:

name       email       CountOf
---------- ----------- -----------
John       John-email  2
sam        sam-email   2

(2 row(s) affected)

if you want the IDs of the dups use this:

SELECT
    y.id,y.name,y.email
    FROM @YourTable y
        INNER JOIN (SELECT
                        name,email, COUNT(*) AS CountOf
                        FROM @YourTable
                        GROUP BY name,email
                        HAVING COUNT(*)>1
                    ) dt ON y.name=dt.name AND y.email=dt.email

OUTPUT:

id          name       email
----------- ---------- ------------
1           John       John-email
2           John       John-email
5           sam        sam-email
6           sam        sam-email

(4 row(s) affected)

to delete the duplicates try:

DELETE d
    FROM @YourTable d
        INNER JOIN (SELECT
                        y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
                        FROM @YourTable y
                            INNER JOIN (SELECT
                                            name,email, COUNT(*) AS CountOf
                                            FROM @YourTable
                                            GROUP BY name,email
                                            HAVING COUNT(*)>1
                                        ) dt ON y.name=dt.name AND y.email=dt.email
                   ) dt2 ON d.id=dt2.id
        WHERE dt2.RowRank!=1
SELECT * FROM @YourTable

OUTPUT:

id          name       email
----------- ---------- --------------
1           John       John-email
3           fred       John-email
4           fred       fred-email
5           sam        sam-email

(4 row(s) affected)
2018/04/10

Try this:

SELECT name, email
FROM users
GROUP BY name, email
HAVING ( COUNT(*) > 1 )
2010/04/07

If you want to delete the duplicates, here's a much simpler way to do it than having to find even/odd rows into a triple sub-select:

SELECT id, name, email 
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id

And so to delete:

DELETE FROM users
WHERE id IN (
    SELECT id/*, name, email*/
    FROM users u, users u2
    WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
)

Much more easier to read and understand IMHO

Note: The only issue is that you have to execute the request until there is no rows deleted, since you delete only 1 of each duplicate each time

2016/12/30

Try the following:

SELECT * FROM
(
    SELECT Id, Name, Age, Comments, Row_Number() OVER(PARTITION BY Name, Age ORDER By Name)
        AS Rank 
        FROM Customers
) AS B WHERE Rank>1
2016/01/12

 SELECT name, email 
    FROM users
    WHERE email in
    (SELECT email FROM users
    GROUP BY email 
    HAVING COUNT(*)>1)
2015/07/22

Source: https://stackoverflow.com/questions/2594829
Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Email: [email protected]