Too many columns in Cassandra

问题

I have 20 columns in a table in Cassandra. Will there be a performance impact in performing

select * from table where partitionKey = 'test';

I am not able to understand from this link,

https://wiki.apache.org/cassandra/CassandraLimitations

1) What will be the consequence of having too many columns (say 20) in the Cassandra tables?

回答1:

Unless you have a lot of rows on the partition, I don't see an impact with having 20 columns. As stated in the documentation that you linked:

The maximum number of cells (rows x columns) in a single partition is 2 billion.

So, unless you are expecting to have more than 100 million rows in a single partition, I don't see why 20 columns would be an issue. Keep in mind that Cassandra is a column family store. This designation means that Cassandra can store a large number of columns per partition.

Having said that, I would personally recommend not to go over 100 MB per partition. It might bring you problems in the future with streaming during repairs.

===============================

To answer to your comment. Keep in mind that partitions and rows are 2 different things in Cassandra. A partition is only equal to a row if there's no clustering columns. For instance, take a look at this table creation and the values we insert, and then look at the sstabledump:

create TABLE tt2 ( foo int , bar int , mar int , PRIMARY KEY (foo , bar )) ;
insert INTO tt2 (foo , bar , mar ) VALUES ( 1, 2, 3) ;
insert INTO tt2 (foo , bar , mar ) VALUES ( 1, 3, 4) ;

sstabledump:

./cassandra/tools/bin/sstabledump ~/cassandra/data/data/tk/tt2-1386f69005bd11e89c0bbfb5c1157523/mc-1-big-Data.db 
[
  {
    "partition" : {
      "key" : [ "1" ],
      "position" : 0
    },
    "rows" : [
      {
        "type" : "row",
        "position" : 32,
        "clustering" : [ "2" ],
        "liveness_info" : { "tstamp" : "2018-01-30T12:57:36.362483Z" },
        "cells" : [
          { "name" : "mar", "value" : 3 }
        ]
      },
      {
        "type" : "row",
        "position" : 32,
        "clustering" : [ "3" ],
        "liveness_info" : { "tstamp" : "2018-01-30T12:58:03.538482Z" },
        "cells" : [
          { "name" : "mar", "value" : 4 }
        ]
      }
    ]
  }
]

Also, if you use the -d option, it might make it easier for you to see the internal representation. As you can see, for the same partition, we have 2 distinct rows:

./cassandra/tools/bin/sstabledump -d ~/cassandra/data/data/tk/tt2-1386f69005bd11e89c0bbfb5c1157523/mc-1-big-Data.db 
[1]@0 Row[info=[ts=1517317056362483] ]: 2 | [mar=3 ts=1517317056362483]
[1]@32 Row[info=[ts=1517317083538482] ]: 3 | [mar=4 ts=1517317083538482]

来源：https://stackoverflow.com/questions/48519373/too-many-columns-in-cassandra

标签

cassandra

cassandra-3.0