Store UUID v4 in MySQL

隐身守侯 提交于 2019-11-27 05:44:40

问题


I'm generating UUIDs using PHP, per the function found here

Now I want to store that in a MySQL database. What is the best/most efficient MySQL field format for storing UUID v4?

I currently have varchar(256), but I'm pretty sure that's much larger than necessary. I've found lots of almost-answers, but they're generally ambiguous about what form of UUID they're referring to, so I'm asking for the specific format.


回答1:


Store it as VARCHAR(36) if you're looking to have an exact fit, or VARCHAR(255) which is going to work out with the same storage cost anyway. There's no reason to fuss over bytes here.

Remember VARCHAR fields are variable length, so the storage cost is proportional to how much data is actually in them, not how much data could be in them.

Storing it as BINARY is extremely annoying, the values are unprintable and can show up as garbage when running queries. There's rarely a reason to use the literal binary representation. Human-readable values can be copy-pasted, and worked with easily.

Some other platforms, like Postgres, have a proper UUID column which stores it internally in a more compact format, but displays it as human-readable, so you get the best of both approaches.




回答2:


If you always have a UUID for each row, you could store it as CHAR(36) and save 1 byte per row over VARCHAR(36).

uuid CHAR(36) CHARACTER SET ascii

In contrast to CHAR, VARCHAR values are stored as a 1-byte or 2-byte length prefix plus data. The length prefix indicates the number of bytes in the value. A column uses one length byte if values require no more than 255 bytes, two length bytes if values may require more than 255 bytes. https://dev.mysql.com/doc/refman/5.7/en/char.html

Though be careful with CHAR, it will always consume the full length defined even if the field is left empty. Also, make sure to use ASCII for character set, as CHAR would otherwise plan for worst case scenario (i.e. 3 bytes per character in utf8, 4 in utf8mb4)

[...] MySQL must reserve four bytes for each character in a CHAR CHARACTER SET utf8mb4 column because that is the maximum possible length. For example, MySQL must reserve 40 bytes for a CHAR(10) CHARACTER SET utf8mb4 column. https://dev.mysql.com/doc/refman/5.5/en/charset-unicode-utf8mb4.html




回答3:


Question is about storing an UUID in MySQL.

Since version 8.0 of mySQL you can use binary(16) with automatic conversion via UUID_TO_BIN/BIN_TO_UUID functions: https://mysqlserverteam.com/mysql-8-0-uuid-support/

Be aware that mySQL has also a fast way to generate UUIDs as primary key:

INSERT INTO t VALUES(UUID_TO_BIN(UUID(), true))




回答4:


Most efficient is definitely BINARY(16), storing the human-readable characters uses over double the storage space, and means bigger indices and slower lookup. If your data is small enough that storing as them as text doesn't hurt performance, you probably don't need UUIDs over boring integer keys. Storing raw is really not as painful as others suggest because any decent db admin tool will display/dump the octets as hexadecimal, rather than literal bytes of "text". You shouldn't need to be looking up UUIDs manually in the db; if you have to, HEX() and x'deadbeef01' literals are your friends. It is trivial to write a function in your app – like the one you referenced – to deal with this for you. You could probably even do it in the database as virtual columns and stored procedures so the app never bothers with the raw data.

I would separate the UUID generation logic from the display logic to ensure that existing data are never changed and errors are detectable:

function guidv4($prettify = false)
{
    static $native = function_exists('random_bytes');
    $data = $native ? random_bytes(16) : openssl_random_pseudo_bytes(16);

    $data[6] = chr(ord($data[6]) & 0x0f | 0x40); // set version to 0100
    $data[8] = chr(ord($data[8]) & 0x3f | 0x80); // set bits 6-7 to 10

    if ($prettify) {
        return guidv4_pretty($data);
    }
    return $data;
}

function guidv4_pretty($data)
{
    return strlen($data) == 16 ?
        vsprintf('%s%s-%s-%s-%s-%s%s%s', str_split(bin2hex($data), 4)) :
        false;
}

function guidv4_ugly($data)
{
    $data = preg_replace('/[^\\dA-F]+/i', '', $data);
    return strlen($data) == 32 ? hex2bin($data) : false;
}

Edit: If you only need the column pretty when reading the database, a statement like the following is sufficient:

ALTER TABLE test ADD uuid_pretty CHAR(36) GENERATED ALWAYS AS (CONCAT_WS('-', LEFT(HEX(uuid_ugly), 8), SUBSTR(HEX(uuid_ugly), 9, 4), SUBSTR(HEX(uuid_ugly), 13, 4), SUBSTR(HEX(uuid_ugly), 17, 4), RIGHT(HEX(uuid_ugly), 12))) VIRTUAL;



回答5:


The most space-efficient would be BINARY(16) or two BIGINT UNSIGNED.

The former might give you headaches because manual queries do not (in a straightforward way) give you readable/copyable values. The latter might give you headaches because of having to map between one value and two columns.

If this is a primary key, I would definitely not waste any space on it, as it becomes part of every secondary index as well. In other words, I would choose one of these types.

For performance, the randomness of random UUIDs (i.e. UUID v4, which is randomized) will hurt severely. This applies when the UUID is your primary key or if you do a lot of range queries on it. Your insertions into the primary index will be all over the place rather than all at (or near) the end. Your data loses temporal locality, which was a helpful property in various cases.

My main improvement would be to use something similar to a UUID v1, which uses a timestamp as part of its data, and ensure that the timestamp is in the highest bits. For example, the UUID might be composed something like this:

Timestamp | Machine Identifier | Counter

This way, we get a locality similar to auto-increment values.




回答6:


I just found a nice article going in more depth on these topics: https://www.xaprb.com/blog/2009/02/12/5-ways-to-make-hexadecimal-identifiers-perform-better-on-mysql/

It covers the storage of values, with the same options already expressed in the different answers on this page:

  • One: watch out for character set
  • Two: use fixed-length, non-nullable values
  • Three: Make it BINARY

But also adds some interesting insight about indexes:

  • Four: use prefix indexes

In many but not all cases, you don’t need to index the full length of the value. I usually find that the first 8 to 10 characters are unique. If it’s a secondary index, this is generally good enough. The beauty of this approach is that you can apply it to existing applications without any need to modify the column to BINARY or anything else—it’s an indexing-only change and doesn’t require the application or the queries to change.

Note that the article doesn't tell you how to create such a "prefix" index. Looking at MySQL documentation for Column Indexes we find:

[...] you can create an index that uses only the first N characters of the column. Indexing only a prefix of column values in this way can make the index file much smaller. When you index a BLOB or TEXT column, you must specify a prefix length for the index. For example:

CREATE TABLE test (blob_col BLOB, INDEX(blob_col(10)));

[...] the prefix length in CREATE TABLE, ALTER TABLE, and CREATE INDEX statements is interpreted as number of characters for nonbinary string types (CHAR, VARCHAR, TEXT) and number of bytes for binary string types (BINARY, VARBINARY, BLOB).

  • Five: build hash indexes

What you can do is generate a checksum of the values and index that. That’s right, a hash-of-a-hash. For most cases, CRC32() works pretty well (if not, you can use a 64-bit hash function). Create another column. [...] The CRC column isn’t guaranteed to be unique, so you need both criteria in the WHERE clause or this technique won’t work. Hash collisions happen quickly; you will probably get a collision with about 100k values, which is much sooner than you might think—don’t assume that a 32-bit hash means you can put 4 billion rows in your table before you get a collision.



来源:https://stackoverflow.com/questions/43056220/store-uuid-v4-in-mysql

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!