How to delete duplicate rows in sybase, when you have no unique key?

问题

Yes, you can find similar questions numerous times, but: the most elegant solutions posted here, work for SQL Server, but not for Sybase (in my case Sybase Anywhere 11). I have even found some Sybase-related questions marked as duplicates for SQL Server questions, which doesn't help.

One example for solutions I liked, but didn't work, is the WITH ... DELETE ... construct.

I have found working solutions using cursors or while-loops, but I hope it is possible without loops.

I hope for a nice, simple and fast query, just deleting all but one exact duplicate.

Here a little framework for testing:

IF OBJECT_ID( 'tempdb..#TestTable' ) IS NOT NULL
  DROP TABLE #TestTable;

CREATE TABLE #TestTable (Column1 varchar(1), Column2 int);

INSERT INTO #TestTable VALUES ('A', 1);
INSERT INTO #TestTable VALUES ('A', 1); -- duplicate
INSERT INTO #TestTable VALUES ('A', 1); -- duplicate
INSERT INTO #TestTable VALUES ('A', 2);
INSERT INTO #TestTable VALUES ('B', 1);
INSERT INTO #TestTable VALUES ('B', 2);
INSERT INTO #TestTable VALUES ('B', 2); -- duplicate
INSERT INTO #TestTable VALUES ('C', 1);
INSERT INTO #TestTable VALUES ('C', 2);

SELECT * FROM #TestTable ORDER BY Column1,Column2;

DELETE <your solution here>

SELECT * FROM #TestTable ORDER BY Column1,Column2;

回答1:

If all fields are identical, you can just do this:

select distinct * 
into #temp_table
from table_with_duplicates 

delete table_with_duplicates 

insert into table_with_duplicates select * from #temp_table

If all fields are not identical, for example, if you have an id that is different, then you'll need to list all the fields in the select statement, and hard code a value in the id to make it identical, if that is a field you don't care about. For example:

insert #temp_table field1, field2, id select (field1, field2, 999)
from table_with_duplicates

回答2:

This works well and fast:

DELETE FROM #TestTable
WHERE ROWID(#TestTable) IN (
  SELECT rowid FROM (
    SELECT ROWID(#TestTable) rowid, 
      ROW_NUMBER() OVER(PARTITION BY Column1,Column2 ORDER BY Column1,Column2) rownum
    FROM #TestTable
  ) sub
  WHERE rownum > 1
);

If you don't know OVER(PARTITION BY ...), just execute the inner SELECT statement to see what it does.

回答3:

Here is another interesting one I found and adopted:

DELETE FROM #TestTable dupes
FROM #TestTable dupes, #TestTable fullTable
WHERE dupes.Column1 = fullTable.Column1
  AND dupes.Column2 = fullTable.Column2
  AND ROWID(dupes) > ROWID(fullTable);

or, if you like explicit joins more (I do):

DELETE FROM #TestTable dupes
FROM #TestTable dupes
INNER JOIN #TestTable fullTable
  ON dupes.Column1 = fullTable.Column1
  AND dupes.Column2 = fullTable.Column2
  AND ROWID(dupes) > ROWID(fullTable);

or the short form (a "natural" join incorporates identical column names automatically):

DELETE FROM #TestTable dupes
FROM #TestTable dupes
NATURAL JOIN #TestTable fullTable
  ON ROWID(dupes) > ROWID(fullTable);

...if someone finds a solution not requiring ROWID(), I would be interested to see them.

回答4:

Please try this:

create clustered index i1 on table table_name(column_name) with ignore_dup_row

create table #test(id int,name char(9))
insert into #test values(1,"A")
insert into #test values(1,"A")
create clustered index i1 on #test(id) with ignore_dup_row
select * from #test

回答5:

Ok, now that I know the ROWID() function, solutions for tables with primary key (PK) can be easily adopted. This one first selects all rows to keep and then deletes the remaining ones:

DELETE FROM #TestTable
FROM #TestTable
LEFT OUTER JOIN (
  SELECT MIN(ROWID(#TestTable)) rowid
  FROM #TestTable
  GROUP BY Column1, Column2
) AS KeepRows ON ROWID(#TestTable) = KeepRows.rowid
WHERE KeepRows.rowid IS NULL;

...or how about this shorter variant? I like!

DELETE FROM #TestTable
WHERE ROWID(#TestTable) NOT IN (
  SELECT MIN(ROWID(#TestTable))
  FROM #TestTable
  GROUP BY Column1, Column2
);

In this post, which inspired me most, is a comment that NOT IN might be slower. But that's for SQL server, and sometimes elegance is more important :) - I also think it all depends on good indexes.

Anyway, usually it is bad design, to have tables without a PK. You should at least add an "autoinc" ID, and if you do, you can use that ID instead of the ROWID() function, which is a non-standard extension by Sybase (some others have it, too).

来源：https://stackoverflow.com/questions/19544489/how-to-delete-duplicate-rows-in-sybase-when-you-have-no-unique-key

标签

sql

sybase

duplicate-removal

sqlanywhere