A query that is used to loop through 17 millions records to remove duplicates has been running now for about 16 hours and I wanted to know if the query is sto
I inherited a system which had logic something like yours implemented in SQL. In our case, we were trying to link together rows using fuzzy matching that had similar names/addresses, etc, and that logic was done purely in SQL. At the time I inherited it we had about 300,000 rows in the table and according to the timings, we calculated it would take A YEAR to match them all.
As an experiment to see how much faster I could do it outside of SQL, I wrote a program to dump the db table into flat files, read the flat files into a C++ program, build my own indexes, and do the fuzzy logic there, then reimport the flat files into the database. What took A YEAR in SQL took about 30 seconds in the C++ app.
So, my advice is, don't even try what you are doing in SQL. Export, process, re-import.