Allocating datastore id using PRNG

问题

Google Cloud Datastore documents that if an entity id needs to be pre-allocated, then one should use the allocateIds method: https://cloud.google.com/datastore/docs/best-practices#keys

That method seems to make a REST or RPC call which has latency. I'd like to avoid that latency by using a PRNG in my Kubernetes Engine application. Here's the scala code:

import java.security.SecureRandom

class RandomFactory {

  protected val r = new SecureRandom

  def randomLong: Long = r.nextLong

  def randomLong(min: Long, max: Long): Long =
    // Unfortunately, Java didn't make Random.internalNextLong public,
    // so we have to get to it in an indirect way.
    r.longs(1, min, max).toArray.head

  // id may be any value in the range (1, MAX_SAFE_INTEGER),
  // so that it can be represented in Javascript.
  // TODO: randomId is used in production, and might be susceptible to
  // TODO: blocking if /dev/random does not contain entropy.
  // TODO: Keep an eye on this concern.
  def randomId: Long =
    randomLong(1, RandomFactory.MAX_SAFE_INTEGER)
}

object RandomFactory extends RandomFactory {

  // MAX_SAFE_INTEGER is es6 Number.MAX_SAFE_INTEGER
  val MAX_SAFE_INTEGER = 9007199254740991L
}

I also plan to install haveged in the pod to help with entropy.

I understand allocateIds ensures that an ID is not already in use. But in my particular use case, there are two mitigating factors to overlooking that concern:

Based on entity count, the chance of a conflict is 1 in 100 million.
This particular entity type is non-essential, and can afford a "once in a blue moon" conflict.

I am more concerned about even distribution in keyspace, because that is normal use case concern.

Will this approach work, particularly with even distribution in keyspace? Is the allocatedIds method essential, or does it just help developers avoid simple mistakes?

回答1:

To get rid of collisions use more bits -- for all practical purposes 128 [See statistics behind UUID V4] will never generate a collision.

Another technique is to insert new entities with a shorter random number and handle the error Cloud Datastore returns if they already exist by trying again with a new ID (until you happen upon one that isn't currently in use).

As far as the key distribution goes: the keys will be randomly distributed within the key space will keep Cloud Datastore happy.

回答2:

Given that you don't want the entity identifier to be based on an external value, you should allow Cloud Datastore to allocate IDs for you. This way you won't have any conflicts. The IDs allocated by Cloud Datastore will be appropriately scattered through the key space.

来源：https://stackoverflow.com/questions/56989788/allocating-datastore-id-using-prng

标签

google-cloud-platform

google-cloud-firestore

google-cloud-datastore