Choosing a partition key for a Cassandra table — how many is too many partitions?

百般思念 提交于 2019-12-21 12:21:44

问题


I have an application where the 'natural' partition key for a Cassandra table seems like it would be 'customer'. This is the primary way we want to query the data, we would get good data distribution, etc.

But if there were well over 1 million customers, would that be too many different partitions?

Should I choose a partition key that results in a smaller number of partition keys?

I've looked at a number of the related questions on this topic but none seem to address this particular point.


回答1:


But if there were well over 1 million customers, would that be too many different partitions?

No. The Murmur3Partitioner can handle something like 2^64 (-2^63 to +2^63) partitions. Cassandra is designed to be very good at storing large amounts of data and retrieving by partition key. There are restrictions on the number of columns within a partition (2 billion), but for total number of partitions I think you'll be fine with what you have.

Should I choose a partition key that results in a smaller number of partition keys?

Definitely not. That could cause your partitions to grow too big, and/or develop "hot spots" in your cluster.

The main task behind picking a good partition key, is to find one that (both) offers good data distribution in the cluster, and matches your query patterns. And from what I'm reading, it sounds like you have done exactly that.




回答2:


I think you misunderstand how the partition key is used. The recommended partitioner takes your partition key values and then computes a 128 bit hash from them. The hash is called the token of the record, and it is that token value that determines where your record is stored. Each Cassandra node has a set of token ranges associated with it. If the token of a record falls with a range of a node, the record is stored on that node. The number of partitions is not determined by your choice of partition key: it is the number of token ranges in your cluster. That is roughly equal to the total number of vnodes you selected when you configured your data store nodes.




回答3:


You are good to go with your current partition key. No need to go for composite partition key to drive more partitions. Are you doing any time series data modelling, growing more columns per second kinda thing. If NOT, your current partition key can go for many million customers.



来源:https://stackoverflow.com/questions/30648479/choosing-a-partition-key-for-a-cassandra-table-how-many-is-too-many-partition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!