Delete duplicates from large dataset (>100Mio rows)

后端 未结 2 1037
悲哀的现实
悲哀的现实 2021-01-01 02:21

I know that this topic came up many times before here but none of the suggested solutions worked for my dataset because my laptop stopped calculating due to memory issues or

2条回答
  •  情话喂你
    2021-01-01 02:40

    If you're using SQL Server, you can use delete from common table expression:

    with cte as (
        select row_number() over(partition by SICComb, NameComb order by Col1) as row_num
        from Table1
    )
    delete
    from cte
    where row_num > 1
    

    Here all rows will be numbered, you get own sequence for each unique combination of SICComb + NameComb. You can choose which rows you want to delete by choosing order by inside the over clause.

提交回复
热议问题