Delete duplicates from large dataset (>100Mio rows)

后端未结

关注

 2  1049

悲哀的现实 2021-01-01 02:21

I know that this topic came up many times before here but none of the suggested solutions worked for my dataset because my laptop stopped calculating due to memory issues or

2条回答

情话喂你 (楼主)

2021-01-01 02:40
If you're using SQL Server, you can use delete from common table expression:
```
with cte as (
    select row_number() over(partition by SICComb, NameComb order by Col1) as row_num
    from Table1
)
delete
from cte
where row_num > 1
```
Here all rows will be numbered, you get own sequence for each unique combination of SICComb + NameComb. You can choose which rows you want to delete by choosing order by inside the over clause.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...