Cassandra UUID vs TimeUUID benefits and disadvantages

匿名 (未验证) 提交于 2019-12-03 01:10:02

问题:

Given that TimeUUID handily allows you to use now() in CQL, are there any reasons you wouldn't just go ahead and always use TimeUUID instead of plain old UUID?

回答1:

UUID and TIMEUUID are stored the same way in Cassandra, and they only really represent two different sorting implementations.

TIMEUUID columns are sorted by their time components first, and then by their raw bytes, whereas UUID columns are sorted by their version first, then if both are version 1 by their time component, and finally by their raw bytes. Curiosly the time component sorting implementations are duplicated between UUIDType and TimeUUIDType in the Cassandra code, except for different formatting.

I think of the UUID vs. TIMEUUID question primarily as documentation: if you choose TIMEUUID you're saying that you're storing things in chronological order, and that these things can occur at the same time, so a simple timestamp isn't enough. Using UUID says that you don't care about order (even if in practice the columns will be ordered by time if you put version 1 UUIDs in them), you just want to make sure that things have unique IDs.

Even if using NOW() to generate UUID values is convenient, it's also very surprising to other people reading your code.

It probably does not matter much in the grand scheme of things, but sorting non-version 1 UUIDs is a bit faster than version 1, so if you have a UUID column and generate the UUIDs yourself, go for another version.



回答2:

A TimeUUID is a plain old UUID according to the documentation.

A UUID is simply a 128-bit value. Think of it as an unimaginably large number.

The particular bits may be determined by any of several methods. The original method involved taking the MAC address of the computer's networking hardware, combining the current date and time, plus an arbitrary number and a random number. Squish all that together to get a virtually unique number.

Later, for various reasons (security, privacy), other methods were invented to assemble the bits when generating a UUID value. These other methods omit date-time and/or MAC address as an ingredient. The point being: Not all UUID values have an embedded date-time value.

The Cassandra doc incorrectly refers to its TimeUUID being a "Type 1 UUID". The correct term is Version 1 UUID. This version is sometimes called the "time-based version".


A Bit Of Advice

Cassandra seems to identify this specific version of UUID for the purpose of extracting the date and time portion of the 128-bits. Extracting the date-time from a UUID is a bad idea.

For one thing, UUID was never intended to be used for such history tracking. Indeed, the spec for UUID specifically recognizes that (a) computer clocks can be reset and therefor (b) UUIDs generated later may actually record an earlier date-time than previous UUIDs. Another reason to not extract date-time from a UUID is because you may well have UUIDs that were not generated by the time method, therefore you will be building a data-time value based on bits that do not in fact represent the date-time of creation. A third reason is that when programming code is later refactored, the UUID may be generated at a different time than the database record so using the UUID's date-time would be misleading.

If you need to track date-time history, do so explicitly. Create a date-time field in your data. By the way, track that date-time in UTC, but that’s another topic.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!