Cassandra: selecting first entry for each value of an indexed column

安稳与你 提交于 2019-12-12 01:53:15

问题


I have a table of events and would like to extract the first timestamp (column unixtime) for each user. Is there a way to do this with a single Cassandra query?

The schema is the following:

CREATE TABLE events (
 id VARCHAR,
 unixtime bigint,
 u bigint,
 type VARCHAR,
 payload map<text, text>, 
 PRIMARY KEY(id)
);

CREATE INDEX events_u
  ON events (u);

CREATE INDEX events_unixtime
  ON events (unixtime);

CREATE INDEX events_type
  ON events (type);

回答1:


According to your schema, each user will have a single time stamp. If you want one event per entry, consider:

PRIMARY KEY (id, unixtime).

Assuming that is your schema, the entries for a user will be stored in ascending unixtime order. Be careful though...if it's an unbounded event stream and users have lots of events, the partition for the id will grow and grow. It's recommended to keep partition sizes to tens or hundreds of megs. If you anticipate larger, you'll need to start some form of bucketing.

Now, on to your query. In a word, no. If you don't hit a partition (by specifying the partition key), your query becomes a cluster wide operation. With little data it'll work. But with lots of data, you'll get timeouts. If you do have the data in its current form, then I recommend you use the Cassandra Spark connector and Apache Spark to do your query. An added benefit of the spark connectory is that if you have cassandra nodes as spark worker nodes, due to locality, you can efficiently hit a secondary index without specifying the partition key (which would normally cause a cluster wide query with timeout issues, etc.). You could even use Spark to get the required data and store it into another cassandra table for fast querying.



来源:https://stackoverflow.com/questions/27840329/cassandra-selecting-first-entry-for-each-value-of-an-indexed-column

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!