Collision probability of ObjectId vs UUID in a large distributed system

前端 未结 2 627
旧时难觅i
旧时难觅i 2020-12-01 08:05

Considering that an UUID rfc 4122 (16 bytes) is much larger than a MongoDB ObjectId (12 bytes), I am trying to find out how their collision probability compare.

I kn

2条回答
  •  死守一世寂寞
    2020-12-01 08:34

    Let's look at the spec for "ObjectId" from the documentation:

    Overview

    ObjectId is a 12-byte BSON type, constructed using:

    • a 4-byte value representing the seconds since the Unix epoch,
    • a 3-byte machine identifier,
    • a 2-byte process id, and
    • a 3-byte counter, starting with a random value.

    So let us consider this in the context of a "mobile client".

    Note: The context here does not mean using a "direct" connection of the "mobile client" to the database. That should not be done. But the "_id" generation can be done quite simply.

    So the points:

    1. Value for the "seconds since epoch". That is going to be fairly random per request. So minimal collision impact just on that component. Albeit in "seconds".

    2. The "machine identifier". So this is a different client generating the _id value. This is removing possibility of further "collision".

    3. The "process id". So where that is accessible to seed ( and it should be ) then the generated _id has more chance of avoiding collision.

    4. The "random value". So another "client" somehow managed to generate all of the same values as above and still managed to generate the same random value.

    Bottom line is, if that is not a convincing enough argument to digest, then simply provide your own "uuid" entries as the "primary key" values.

    But IMHO, that should be a fair convincing argument to consider that the collision aspects here are very broad. To say the least.

    The full topic is probably just a little "too-broad". But I hope this moves consideration a bit more away from "Quite unlikely" and on to something a little more concrete.

提交回复
热议问题