Faster hash with less collisions?

半腔热情 提交于 2019-12-12 09:08:16

问题


Which form of hashing would return the fastest results (and least chance of 2 results returning the same hash) when summing up the hash of all rows (10 million) for one column(could be NUMBER, VARCHAR, DATE, TIMESTAMP but no CLOBS,XML.etc)? This value would then be compared to the same operation on another table to check if all rows for that same column are exactly the same.

 SET SERVEROUTPUT ON
DECLARE
 HASH_VAL NUMBER;
begin
DBMS_OUTPUT.PUT_LINE (OWA_OPT_LOCK.CHECKSUM('column_here'));
DBMS_OUTPUT.PUT_LINE (DBMS_UTILITY.GET_HASH_VALUE('column_here',1,POWER(2,31)-1));
EXECUTE IMMEDIATE 'SELECT ORA_HASH(''column_here'') FROM DUAL' INTO HASH_VAL;
DBMS_OUTPUT.PUT_LINE (HASH_VAL);
DBMS_OUTPUT.PUT_LINE (DBMS_OBFUSCATION_TOOLKIT.MD5( INPUT_STRING => 'column_here'));
DBMS_OUTPUT.PUT_LINE ( DBMS_CRYPTO.HASH(UTL_RAW.CAST_TO_RAW('column_here'),3) );
END;
/

回答1:


Which method is faster?

I haven't bench marked this, but I would guess that DBMS_SQLHASH is the fastest, since it was built exactly for this type of problem.

It is an official package, but not well documented in the Security Guide. It's not in the 5,964(!) page PL/SQL Packages and Types Reference, and you need to grant execute on dbms_sqlhash to [user]; for it to work, that's probably why almost nobody has heard of it.

For example:

select sys.DBMS_SQLHASH.GETHASH(sqltext=>'select 1 from dual', digest_type=>1)
from dual;

digest_type: 1 = HASH_MD4, 2 = HASH_MD5, 3 = HASH_SH1

Chance of a collision

There are some questions about the chances of a collision: Hash Collision - what are the chances?, Can two different strings generate the same MD5 hash code?

I'm not sure exactly what happens to the chance when you start summing many rows, but the chances of a single collision are so ridiculously low that you're probably ok.

I don't know the math, but I am sure that the most likely cause of a collision is from a programming error if you try to write your own function.

I've seen and built scripts just like this, and there are many subtle ways to screw it up. For example, null values and swapping values between rows or columns. Even though you're only using one column now, to prevent someone from ever writing one of those ugly scripts you should use the Oracle supplied package whenever possible.



来源:https://stackoverflow.com/questions/8017946/faster-hash-with-less-collisions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!