Cassandra Wide Vs Skinny Rows for large columns

前端 未结 2 1005
北荒
北荒 2020-12-29 06:32

I need to insert 60GB of data into cassandra per day.

This breaks down into
100 sets of keys
150,000 keys per set
4KB of data per key

In terms

相关标签:
2条回答
  • 2020-12-29 07:12

    The answer depends on what your data retrieval pattern is, and how your data is logically grouped. Broadly, here is what I think:

    • Wide row (1 row per set): This could be the best solution as it prevents the request from hitting several nodes at once, and with secondary indexing or composite column names, you can quickly filter data to your needs. This is best if you need to access one set of data per request. However, doing too many multigets on wide rows can increase memory pressure on nodes, and degrade performance.
    • Skinny row (1000 rows per set): On the other hand, a wide row can give rise to read hotspots in the cluster. This is especially true if you need to make a high volume of requests for a subset of data that exists entirely in one wide row. In such a case, a skinny row will distribute your requests more uniformly throughout the cluster, and avoid hotspots. Also, in my experience, "skinnier" rows tend to behave better with multigets.

    I would suggest, analyze your data access pattern, and finalize your data model based on that, rather than the other way around.

    0 讨论(0)
  • 2020-12-29 07:12

    You'd be better off using 1 row per set with 150,000 columns per row. Using TTL is good idea to have an auto-cleaning process.

    0 讨论(0)
提交回复
热议问题