Cassandra - What is the reasonable maximum number of tables?

我的梦境 提交于 2019-12-30 18:43:07

问题


I am new to Cassandra. As I understand the maximum number of tables that can be stored per keyspace is Integer.Max_Value. However, what are the implications from the performance perspective (speed, storage, etc) of such a big number of tables? Is there any recommendation regarding that?


回答1:


While there are legitimate use cases for having lots of tables in Cassandra, they are rare. Your use case might be one of them, but make sure that it is. Without knowning more about the problem you're trying to solve, it's obviously hard to give guidance. Many tables will require more resources, obviously. How much? That depends on the settings, and the usage.

For example, if you have a thousand tables and write to all of them at the same time there will be contention for RAM since there will be memtables for each of them, and there is a certain overhead for each memtable (how much depends on which version of Cassandra, your settings, etc.).

However, if you have a thousand tables but don't write to all of them at the same time, there will be less contention. There's still a per table overhead, but there will be more RAM to keep the active table's memtables around.

The same goes for disk IO. If you read and write to a lot of different tables at the same time the disk is going to do much more random IO.

Just having lots of tables isn't a big problem, even though there is a limit to how many you can have – you can have as many as you want provided you have enough RAM to keep the structures that keep track of them. Having lots of tables and reading and writing to them all at the same time will be a problem, though. It will require more resources than doing the same number of reads and writes to fewer tables.




回答2:


In my opinion if you can split the data into multiple tables, even thousands, is beneficial.

Pros:

  1. Suppose you want to scale in future to 10+ nodes and with a RF of 2 will result in having the data evenly distributed across nodes, thus not salable.
  2. Another point is random IO which will be big if you will read from many tables at the same time but I don't see why there is a difference when having just one table. Also you will seek for another partition key, so no difference in IO.
  3. When the compactation takes place it will have to do less work if there is only one table. The values from SSTables must be loaded into memory, merged and saved back.

Cons:

  1. Having multiple tables will result in having multiple memtables. I think the difference added by this to the RAM is insignificant.

Also, check out the links, they helped me A LOT
http://manuel.kiessling.net/2016/07/11/how-cassandras-inner-workings-relate-to-performance/
https://www.infoq.com/presentations/Apache-Cassandra-Anti-Patterns

Please fell free to edit my post, I am kinda new to Big Data



来源:https://stackoverflow.com/questions/33212414/cassandra-what-is-the-reasonable-maximum-number-of-tables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!