When to use Cassandra vs. Solr in DSE?

丶灬走出姿态 提交于 2019-12-03 03:10:10

Cassandra secondary indexes have limited use cases:

  1. No more than a couple of columns indexed.
  2. Only a single indexed column in a query.
  3. Too much inter-node traffic for high cardinality data (relatively unique column values)
  4. Too much inter-node traffic for low cardinality data (high percentage of rows will match)
  5. Queries need to be known in advance so data model can be optimized around them.

Because of these limitations, it is common for apps to create "index tables" which are indexed by whatever column is desired. This requires either that data be duplicated from the main table to each index table, or an extra query will be needed to read the index table and then read the actual row from the main table after reading the main key from the index table. Queries on multiple columns will have to be manually indexed in advance, making ad hoc queries problematic. And any duplicated will have to be manually updated by the app into each index table.

Other than that... they will work fine in cases where a "modest" number of rows will be selected from a modest number of nodes, and queries are well specified in advance and not ad hoc.

DSE/Solr is better for:

  1. A moderate number of columns are indexed.
  2. Complex queries with a number of columns/fields referenced - Lucene matches all specified fields in a query in parallel. Lucene indexes the data on each node, so nodes query in parallel.
  3. Ad hoc queries in general, where the precise queries are not known in advance.
  4. Rich text queries such as keyword search, wildcard, fuzzy/like, range, inequality.

There is a performance and capacity cost to using Solr indexing, so a proof of concept implementation is recommended to evaluate how much additional RAM, storage, and nodes are needed, which depends on how many columns you index, the amount of text indexed, and any text filtering complexity (e.g., n-grams need more.) It could range from 25% increase for a relatively small number of indexed columns to 100% if all columns are indexed. Also, you need to have enough nodes so that the per-node Solr index fits in RAM or mostly in RAM if using SSD. And vnodes are not currently recommended for Solr data centers.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!