How does BigQuery's FARM_FINGERPRINT represent a 64-bit *unsigned* int?

痴心易碎 提交于 2021-02-11 12:03:25

问题


BigQuery conveniently includes the FARM_FINGERPRINT function. Here's an excerpt of the documentation for this function:

Description

Computes the fingerprint of the STRING or BYTES input using the Fingerprint64 function from the open-source FarmHash library. The output of this function for a particular input will never change.

Return type

INT64

Note that the return type is an INT64, which in bigquery is a 64-bit signed int.

However, if we look at the actual implementation of Fingerprint64, we can see right in the header file that it returns an unsigned 64-bit int.

The problem A 64 bit unsigned int has twice the maximum value of a 64-bit signed int. So half the time, FARM_FINGERPRINT will generate an output that is outside the representable range of a BigQuery INT64. In such cases, what does BigQuery do? Somehow it transform the output of Fingerprint64 to fit into the range of a signed int, but the documentation doesn't say how.

One way to do this would just let the value overflow, causing the value to wrap around into the negative range of the signed int. However, as Fingerprint64 is meant to be a portable function, that seems like a poor design, because then its output in BigQuery differs from the standard output in other systems. If this discrepancy exists, it should at least be documented with a big fat warning!


回答1:


The documentation says it uses "Fingerprint64 function from the open-source FarmHash library" but doesn't say that it's exactly the same function as it is. And since int64 in BigQuery is signed, it can't have the same values than uint64 (unsigned), so Two's complement is applied in order to make them fit taking the first bit as the signed bit. (Just as @ElliottBrossard and Conrad Lee found)



来源:https://stackoverflow.com/questions/51892989/how-does-bigquerys-farm-fingerprint-represent-a-64-bit-unsigned-int

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!