SQL way to get the MD5 or SHA1 of an entire row

假装没事ソ 提交于 2019-12-14 03:39:56

问题


Is there a "semi-portable" way to get the md5() or the sha1() of an entire row? (Or better, of an entire group of rows ordered by all their fields, i.e. order by 1,2,3,...,n)? Unfortunately not all DBs are PostgreSQL... I have to deal with at least microsoft SQL server, Sybase, and Oracle.

Ideally, I'd like to have an aggregator (server side) and use it to detect changes in groups of rows. For example, in tables that have some timestamp column, I'd like to store a unique signature for, say, each month. Then I could quickly detect months that have changed since my last visit (I am mirrorring certain tables to a server running Greenplum) and re-load those.

I've looked at a few options, e.g. checksum(*) in tsql (horror: it's very collision-prone, since it's based on a bunch of XORs and 32-bit values), and hashbytes('MD5', field), but the latter can't be applied to an entire row. And that would give me a solution just for one of the SQL flavors I have to deal with.

Any idea? Even for just one of the SQL idioms mentioned above, that would be great.


回答1:


You could calculate the hashbytes value for the entire row on an update trigger, I used this as part of an ETL process where previously they were comparing all columns in the tables, the speed increase was huge.

Hashbytes works on varchar, nvarchar, or varbinary datatypes, and I wanted to compare integer keys and text fields, casting everything would have been a nightmare, so I used the FOR XML clause in SQL server as follows:

CREATE TRIGGER get_hash_value ON staging_table
FOR UPDATE, INSERT AS  
UPDATE staging_table
SET sha1_hash = (SELECT hashbytes('sha1', (SELECT col1, col2, col3 FOR XML RAW)))
GO

alternatively, you could calculate the values in a similar way outside of a trigger, if you plan to do many updates on all the rows by using a subquery with the for xml clause also. If going this route, you can even change it to a SELECT *, but not in the trigger, as each time you run it you would be getting a different value because the sha1_hash column would be different each time.

You could modify the select statement to get more than 1 row




回答2:


In MSSQL -- You can use HashBytes across the entire row by using xml..

SELECT MBT.id,
   hashbytes('MD5',
               (SELECT MBT.*
                FROM (
                      VALUES(NULL))foo(bar)
                FOR xml auto)) AS [Hash]
FROM <Table> AS MBT;

You need the from (values(null))foo(bar) clause to use xml auto, it serves no other purpose..



来源:https://stackoverflow.com/questions/16452658/sql-way-to-get-the-md5-or-sha1-of-an-entire-row

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!