How get the T-SQL code to find duplicates?

前端未结

关注

 5  618

MS Access has a button to generate sql code for finding duplicated rows. I don\'t know if SQL Server 2005/2008 Managment Studio has this.

If it has, please

相关标签:

5条回答

青春惊慌失措

2020-12-12 20:56
Well, if you have entire rows as duplicates in your table, you've at least not got a primary key set up for that table, otherwise at least the primary key value would be different.

However, here's how to build a SQL to get duplicates over a set of columns:
```
SELECT col1, col2, col3, col4
FROM table
GROUP BY col1, col2, col3, col4
HAVING COUNT(*) > 1
```
This will find rows which, for columns col1-col4, has the same combination of values, more than once.

For instance, in the following table, rows 2+3 would be duplicates:
```
PK    col1    col2    col3    col4    col5
1       1       2       3       4      6
2       1       3       4       7      7
3       1       3       4       7      10
4       2       3       1       4      5
```
The two rows share common values in columns col1-col4, and thus, by that SQL, is considered duplicates. Expand the list of columns to contain all the columns you wish to analyze this for.
0 讨论(0)
发布评论:

提交评论
- 加载中...
一向

2020-12-12 21:01
Another way one can do this is by joining a table on itself.
```
SELECT *
FROM dbo.TableA aBase
JOIN dbo.TableA aDupes ON aDupes.ColA = aBase.ColA AND
                          aDupes.ColB = aBase.ColB
WHERE aBase.Pkey < aDupes.Pkey
```
Note: The aBase.Pkey < aDupes.Pkey is there because joining a table against itself will create two rows per match since the condition will always be true twice.

In other words: If table aBase has a row equal to a row from aDupes (based on ColA and ColB), the reflection of that match will also be true - that aDupes has a row equal to a row aBase based on ColA and ColB. Therefore both of those matches will be returned in the result set.

Narrow this down/eliminate this reflection by arbitrarily picking all results where one of the tables has a lower key.

< or > doesn't matter, as long as the keys are different.

This also takes care of filtering out matches with a row upon itself because aBase.Pkey < aDupes.Pkey forces the primary keys to be different.
0 讨论(0)
发布评论:

提交评论
- 加载中...
野的像风

2020-12-12 21:03
I found this solution when I need to dump entire rows with one or more duplicate fields but I don't want to type every field name in the table:
```
SELECT * FROM db WHERE col IN
    (SELECT col FROM db GROUP BY col HAVING COUNT(*) > 1)
    ORDER BY col
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
长发绾君心

2020-12-12 21:06
If you're using SQL Server 2005+, you can use the following code to see all the rows along with other columns:
```
SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber
FROM table
```
Youd can also delete (or otherwise work with) duplicates using this technique:
```
WITH cte AS
(SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, col4 ORDER BY (SELECT 0)) AS DuplicateRowNumber
    FROM table
)
DELETE FROM cte WHERE DuplicateRowNumber > 1
```
ROW_NUMBER is extremely powerful - there is much you can do with it - see the BOL article on it at http://msdn.microsoft.com/en-us/library/ms186734.aspx
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦谈多话

2020-12-12 21:18

AFAIK, it doesn't. Just make a select statement grouping by all the fields of a table, and filtering using a having clause where the count is greater than 1.

If your rows are duplicated except by the key, then don't include the key in the select fields.

0 讨论(0)
发布评论:

提交评论
- 加载中...