Cassandra bucket splitting for partition sizing

[亡魂溺海] 提交于 2019-12-01 13:07:42

You should focus on your requirements, and then go back to your schema model. In your case, how many measures per day each instruments can do? If each one can do less than your 400k measures then you're already done without bucketing. If your instruments can perform up to 10M measures each, then N=10M/400k buckets should be enough to satisfy your requirements. Assuming N buckets, when you need to query all the measures coming from a particular instrument you have to perform N queries, one for each bucket, unless you can count the measures during your writes, so that you can change bucket when a bucket is full. I mean, you write the first 400k measures in the bucket 0, then you write the second 400k measures to the bucket 1, and so on. Then you need to keep track of on how many K buckets you inserted data and perform only K queries instead on N. That way you have unbalanced buckets (and partitions), but you get your results in the smallest number of queries. If you prefer a balanced-bucket approach instead, you can perform each write in a uniformly distributed random bucket number, but then you have to perform all of your N queries to get all the data of a specific instrument.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!