How to Implement T-SQL CHECKSUM() in JavaScript for BigQuery?

六眼飞鱼酱① 提交于 2019-12-12 06:39:11

问题


The end result I'm looking for is to implement T-SQL CHECKSUM in BigQuery with a JavaScript UDF. I would settle for having the C/C++ source code to translate but if someone has already done this work then I'd love to use it.

Alternatively, if someone can think of a way to create an equivalent hash code between strings stored in Microsoft SQL Server compared to those in BigQuery then that would help me too.


  • UPDATE: I've found some source code through HABO's link in the comments which is written in T-SQL to perform the same CHECKSUM but I'm having difficulty converting it to JavaScript which inherently cannot handle 64bit integers. I'm playing with some small examples and have found that the algorithm works on the low nibble of each byte only.
  • UPDATE 2: I got really curious about replicating this algorithm and I can see some definite patterns but my brain isn't up to the task of distilling that into a reverse engineered solution. I did find that BINARY_CHECKSUM() and CHECKSUM() return different things so the work done on the former didn't help me with the latter.

回答1:


I spent the day reverse engineering this by first dumping all results for single ASCII characters as well as pairs. This showed that each character has its own distinct "XOR code" and letters have the same one regardless of case. The algorithm was remarkably simple to figure out after that: rotate 4 bits left and xor by the code stored in a lookup table.

var xorcodes = [
    0, 1, 2, 3, 4, 5, 6, 7,
    8, 9, 10, 11, 12, 13, 14, 15,
    16, 17, 18, 19, 20, 21, 22, 23,
    24, 25, 26, 27, 28, 29, 30, 31,
    0, 33, 34, 35, 36, 37, 38, 39,  //  !"#$%&'
    40, 41, 42, 43, 44, 45, 46, 47,  // ()*+,-./
    132, 133, 134, 135, 136, 137, 138, 139,  // 01234567
    140, 141, 48, 49, 50, 51, 52, 53, 54,  // 89:;<=>?@
    142, 143, 144, 145, 146, 147, 148, 149,  // ABCDEFGH
    150, 151, 152, 153, 154, 155, 156, 157,  // IJKLMNOP
    158, 159, 160, 161, 162, 163, 164, 165,  // QRSTUVWX
    166, 167, 55, 56, 57, 58, 59, 60,  // YZ[\]^_`
    142, 143, 144, 145, 146, 147, 148, 149,  // abcdefgh
    150, 151, 152, 153, 154, 155, 156, 157,  // ijklmnop
    158, 159, 160, 161, 162, 163, 164, 165,  // qrstuvwx
    166, 167, 61, 62, 63, 64, 65, 66,  // yz{|}~
];

function rol(x, n) {
    // simulate a rotate shift left (>>> preserves the sign bit)
    return (x<<n) | (x>>>(32-n));
}

function checksum(s) {
    var checksum = 0;
    for (var i = 0; i < s.length; i++) {
        checksum = rol(checksum, 4);

        var c = s.charCodeAt(i);
        var xorcode = 0;
        if (c < xorcodes.length) {
            xorcode = xorcodes[c];
        }
        checksum ^= xorcode;
    }
    return checksum;
};

See https://github.com/neilodonuts/tsql-checksum-javascript for more info.

DISCLAIMER: I've only worked on compatibility with VARCHAR strings in SQL Server with collation set to SQL_Latin1_General_CP1_CI_AS. This won't work with multiple columns or integers but I'm sure the underlying algorithm uses the same codes so it wouldn't be hard to figure out. It also seems to differ from db<>fiddle possibly due to collation: https://github.com/neilodonuts/tsql-checksum-javascript/blob/master/data/dbfiddle-differences.png ... mileage may vary!



来源:https://stackoverflow.com/questions/58980138/how-to-implement-t-sql-checksum-in-javascript-for-bigquery

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!