How to generate a hash of the result set in Postgress?

主宰稳场 提交于 2021-01-28 04:43:55

问题


I have two databases for logging stuff, which I want to check for synchronization.

The solution approved is to send periodically (lets say hourly) a select to both, generate a hash of the result set and compare them. If they match then great, otherwise generate some alarms.

Currently I'm doing it by (bash script):

 log_table="SELECT column1, column2, column3 FROM log_table where to_char(timestamp, '$ts_format') = '$tx_moment'";
PSQL="psql -t -q -h $_gp_host -U $_gp_user -d log_schema -c ";
echo "`${PSQL} $tx_fix${log_table} | sort | cksum`";

I would like to do the cksum/hash on the postgres side. Because currently it downloads the whole result set (which can have even 25MB or more) and generates the checksum on the server side.

Google didn't help.

Any suggestions?

Thanks.


回答1:


You could use md5:

 log_table="
SELECT 
  md5(column1 || column2 || column3) AS hash,
  column1, column2, column3
FROM log_table where to_char(timestamp, '$ts_format') = '$tx_moment'";



回答2:


If you want to do a hash over all of it at once, that's going to use up a lot of memory server-side as well. And once you hit 1Gb, it won't work anymore since a single string can't be longer than that.

Perhaps something like this will work, which basically does a hash of each row, and then hashes those hashes. It will still break whe nthe length of hashes go above 1Gb - you'll need to write a custom md5 aggregate to get around that.

SELECT md5(concat(md5(column1 || column2 || column3))) FROM log_table WHERE ...

This requires that you have created the custom aggregate concat like this:

CREATE AGGREGATE concat (
    BASETYPE = text,
    SFUNC = textcat,
    STYPE = text,
    INITCOND = ''
);


来源:https://stackoverflow.com/questions/3533314/how-to-generate-a-hash-of-the-result-set-in-postgress

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!