Is there a DynamoDB max partition size of 10GB for a single partition key value?

折月煮酒 提交于 2019-11-30 06:51:23

TL;DR - items can be split even if they have the same partition key value by including the range key value into the partitioning function.


The long version:

This is a very good question, and it is addressed in the documentation here and here. As the documentation states, items in a DynamoDB table are partitioned based on their partition key value (which used to be called hash key) into one or multiple partitions, using a hashing function. The number of partitions is derived based on the maximum desired total throughput, as well as the distribution of items in the key space. In other words, if the partition key is chosen such that it distributes items uniformly across the partition key space, the partitions end up having approximately the same number of items each. This number of items in each partition is approximately equal to the total number of items in the table divided by the number of partitions.

The documentation also states that each partition is limited to about 10GB of space. And that once the sum of the sizes of all items stored in any partition grows beyond 10GB, DynamoDB will start a background process that will automatically and transparently split such partitions in half - resulting in two new partitions. Once again, if the items are distributed uniformly, this is great because each new sub-partition will end up holding roughly half the items in the original partition.

An important aspect to splitting is that the throughput of the split-partitions will each be half of the throughput that would have been available for the original partition.

So far we've covered the happy case.

On the flip side it is possible to have one, or a few, partition key values that correspond to a very large number of items. This can usually happen if the table schema uses a sort key and several items hash to the same partition key. In such case, it is possible that a single partition key could be responsible for items that together take up more than 10 GB. And this will result in a split. In this case DynamoDB will still create two new partitions but instead of using only the partition key to decide which sub-partition should an item be stored in, it will also use the sort key.

Example

Without loss of generality and to make things easier to reason about, imagine that there is a table where partition keys are letters (A-Z), and numbers are used as sort keys.

Imaging that the table has about 9 partitions, so letters A,B,C would be stored in partition 1, letters D,E,F would be in partition 2, etc.

In the diagram below, the partition boundaries are marked h(A0), h(D0) etc. to show that, for instance, the items stored in the first partition are the items who's partition key hashes to a value between h(A0) and h(D0) - the 0 is intentional, and comes in handy next.

[ h(A0) ]--------[ h(D0) ]---------[ h(G0) ]-------[ h(J0) ]-------[ h(M0) ]- ..
  |   A    B    C   |       E    F   |   G      I    |   J    K   L  |
  |   1    1    1   |       1    1   |   1      1    |   1    1   1  |
  |   2    2    2   |       2    2   |          2    |        2      |
  |   3         3   |            3   |          3    |               |
  ..                ..               ..              ..              ..
  |            100  |           500  |               |               |
  +-----------------+----------------+---------------+---------------+-- ..

Notice that for most partition key values, there are between 1 and 3 items in the table, but there are two partition key values: D and F that are not looking too good. D has 100 items while F has 500 items.

If items with a partition key value of F keep getting added, eventually the partition [h(D0)-h(G0)) will split. To make it possible to split the items that have the same hash key, the range key values will have to be used, so we'll end up with the following situation:

..[ h(D0) ]------------/ [ h(F500) ] / ----------[ h(G0) ]- ..
      |       E       F       |           F         |
      |       1       1       |          501        |
      |       2       2       |          502        |
      |               3       |          503        |
      ..                      ..                    ..
      |              500      |         1000        |
.. ---+-----------------------+---------------------+--- ..

The original partition [h(D0)-h(G0)) was split into [h(D0)-h(F500)) and [h(F500)-h(G0))

I hope this helps to visualize that items are generally mapped to partitions based on a hash value obtained by applying a hashing function to their partition key value, but if need be, the value being hashed can include the partition key + a sort key value as well.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!