Basically, we have one table (original table
) and it is backed up into another table (backup table
); thus the two tables have exactly the same sche
1: First get count for both the tables C1 and C2. C1 and C2 should be equal. C1 and C2 can be obtained from the following query
select count(*) from table1
if C1 and C2 are not equal, then the tables are not identical.
2: Find distinct count for both the tables DC1 and DC2. DC1 and DC2 should be equal. Number of distinct records can be found using the following query:
select count(*) from (select distinct * from table1)
if DC1 and DC2 are not equal, the tables are not identical.
3: Now get the number of records obtained by performing a union on the 2 tables. Let it be U. Use the following query to get the number of records in a union of 2 tables:
SELECT count (*)
FROM
(SELECT *
FROM table1
UNION
SELECT *
FROM table2)
You can say that the data in the 2 tables is identical if distinct count for the 2 tables is equal to the number of records obtained by performing union of the 2 tables. ie DC1 = U and DC2 = U
SELECT * FROM Table1
UNION
SELECT * FROM Table2
If you get records greater than any of two tables, they don't have same data.
You can just use CHECKSUM TABLE and compare the results. You can even alter the table to enable live checksums so that they are continuously available.
CHECKSUM TABLE original_table, backup_table;
It doesn't require the tables to have a primary key.
For the lazier or more SQL-averse developer working with MS SQL Server, I would recommend SQL Delta (www.sqldelta.com) for this and any other database-diff type work. It has a great GUI, is quick and accurate and can diff all database objects, generate and run the necessary change scripts, synchronise entire databases. Its the next best thing to a DBA ;-)
I think there is a similar tool available from RedGate called SQL Compare. I believe some editions of the latest version of Visual Studio (2010) also include a very similar tool.
Please try the following method for determining if two tables are exactly the same, when there is no primary key of any kind and there are no duplicate rows within a table, using the below logic:
Step 1 - Test for Duplicate Rows on TABLEA
If SELECT DISTINCT * FROM TABLEA
has the same row count as
SELECT * FROM TABLEA
then go to the next step, otherwise you can't use this method...
Step 2 - Test for Duplicate Rows on TABLEB
If SELECT DISTINCT * FROM TABLEB
has the same row count as
SELECT * FROM TABLEB
then go to the next step, else you can't use this method...
Step 3 - INNER JOIN TABLEA to TABLEB on every column
If the row count of the below query has the same row count as the row counts from Steps 1 and 2, then the tables are the same:
SELECT
*
FROM
TABLEA
INNER JOIN TABLEA ON
TABLEA.column1 = TABLEB.column1
AND TABLEA.column2 = TABLEB.column2
AND TABLEA.column3 = TABLEB.column3
--etc...for every column
Note that this method doesn't necessarily test for different data types, and probably won't work on non-joinable data types (like VARBINARY)
Feedback welcome!
select count(*)
from lemmas as original_table
full join backup_table using (lemma_id)
where backup_table.lemma_id is null
or original_table.lemma_id is null
or original_table.lemma != backup_table.lemma
The full join / check for null should cover additions or deletions as well as changes.