问题
Google BigQuery doesn't support UUID as data type. So, which option is better to store it:
STRING
: String with the format 8-4-4-4-12BYTES
: Array of 16 bytes (128 bits)
回答1:
Edit: BigQuery now supports a function called GENERATE_UUID. This returns a STRING
with 32 hexadecimal digits in five groups separated by hyphens in the form 8-4-4-4-12.
Original content:
Some discussion of the tradeoffs:
Using STRING
- UUIDs are compatible with the representation in other systems, such as if you export to CSV and then want to merge with exports from elsewhere.
- UUIDs are compatible with BigQuery's probably UUID implementation. You will be able to generate UUIDs of this same form using a function (when the feature is implemented).
- If you decide to represent the UUIDs as
BYTES
later, you can potentially convert using a UDF. - Downside: Comparisons may not be as fast as with
BYTES
depending on the operator, since string comparisons have to take UTF-8 encoding into account. (It sounds like this isn't an issue for you). - Downside: Storage costs are higher. (It sounds like this isn't an issue for you).
Using BYTES
- UUIDs are stored more compactly; storage is cheaper and comparisons are faster.
- If you decide to represent the UUIDs as
STRING
s later, you can potentially convert them using a UDF. - Downside: UUIDs are not compatible with other systems after export, and will likely not be compatible with BigQuery's implementation either.
来源:https://stackoverflow.com/questions/49404747/create-a-column-of-uuids-in-google-bigquery