Cassandra has a limit of 2 billion cells per partition, but what's a partition?

后端 未结 2 1596
独厮守ぢ
独厮守ぢ 2020-12-02 10:45

In Cassandra Wiki, it is said that there is a limit of 2 billion cells (rows x columns) per partition. But it is unclear to me what is a partition?

Do w

2条回答
  •  心在旅途
    2020-12-02 10:59

    With the advent of CQL3 the terminology has changed slightly from the old thrift terms.

    Basically

    Create Table foo (a int , b int, c int, d int, PRIMARY KEY ((a,b),c))
    

    Will make a CQL3 table. The information in a and b is used to make the partition key, this describes which node the information will reside on. This is the 'partiton' talked about in the 2 billion cell limit.

    Within that partition the information will be organized by c, known as the clustering key. Together a,b and c, define a unique value of d. In this case the number of cells in a partition would be c * d. So in this example for any given pair of a and b there can only be 2 billion combinations of c and d

    So as you model your data you want to ensure that the primary key will vary so that your data will be randomly distributed across Cassandra. Then use clustering keys to ensure that your data is available in the way you want it.

    Watch this video for more info on Datmodeling in cassandra The Datamodel is Dead, Long live the datamodel

    Edit: One more example from the comments

    Create Table foo (a int , b int, c int, d int, e int, f int, PRIMARY KEY ((a,b),c,d))
    

    Partitions will be uniquely identified by a combination of a and b.

    Within a partition c and d will be used to order cells within the partition so the layout will look a little like:

    (a1,b1) --> [c1,d1 : e1], [c1,d1  :f1], [c1,d2 : e2] ....  
    

    So in this example you can have 2 Billion cells with each cell containing:

    • A value of c
    • A value of d
    • A value of either e or f

    So the 2 billion limit refers to the sum of unique tuples of (c,d,e) and (c,d,f).

提交回复
热议问题