Cassandra denormalization datamodel

南笙酒味 提交于 2019-12-03 03:34:11

"Yes" for the most part, taking an approach of query-based data modeling really is the best way to do it.

  1. That is still a good idea to do, because the speed of your query times make it worth it. Yes, there's a little more housecleaning to do. I haven't had to execute 100s of deletes from other column families, but occasionally there is some complicated clean-up to do. But, you shouldn't be doing a whole lot of deleting in Cassandra anyway (anti-pattern).

  2. No. Client-side JOINs are just as bad as distributed JOINs. The whole idea is to create a table to return data for each specific query...denormalized and/or replicated...and thus negating the need to do a JOIN at all. The exception to this, is if you are running OLAP queries for analysis, you can use a tool like Apache Spark to execute an ad-hoc, distributed JOIN. But it's definitely not something you'd want to do on a production system.

  3. A few articles I can recommend:

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!