Range query on clustering key

不想你离开。 提交于 2021-02-10 06:31:12

问题


I have a table where I am logging user activity performed on my website. My table structure looks like:

CREATE TABLE key_space.log (
    id uuid,
    time bigint,
    ip text,
    url text,
    user_id int,
    PRIMARY KEY (id, time)
) WITH CLUSTERING ORDER BY (time DESC)

Now I want to fetch all the records which came in last 5 minutes.

For doing the same, I am using

select * from key_space.log where 
  time>current_timestamp - 5 minutes ALLOW FILTERING;

But this query is not returning any result & i am getting timedoutexception error. How to solve this problem? Any help on this would be really appreciated.


回答1:


Your table has id as the partition key and time as the clustering key. In order to run range query on clustering key, you need to specify the partition key also. So the query should be

 select * from key_space.log where id="xyz" and time>current_timestamp - 5 minutes ALLOW FILTERING;

This answer describes the different Cassandra keys very well.




回答2:


As with all Cassandra models, you'll need to start by building a table specifically designed to support that query. Even if you could make it work with your current table, it would have to scan every node in the cluster, which would probably time-out (as you are seeing).

One way to do this will be to use a time "bucket" as a partition key. If you just care about records for the last 5 minutes, then "day" should work, as long as you don't get millions of new records per day. If you do, then you'll need a smaller time component for your "bucket."

CREATE TABLE log_by_day (
    id uuid,
    day bigint,
    time bigint,
    ip text,
    url text,
    user_id int,
    PRIMARY KEY (day, time, id)
) WITH CLUSTERING ORDER BY (time DESC, id ASC);

Now a query like this will work:

aaron@cqlsh:stackoverflow> SELECT day,time,id,user_id FROM log_by_day
  WHERE day=201920 AND time > 1563635871941;

 day    | time          | id                                   | user_id
--------+---------------+--------------------------------------+---------
 201920 | 1563635872259 | 7fef03da-6c23-4bf2-9e98-fd126ab17944 |    1234
 201920 | 1563635872259 | 9a0c49ce-5ad2-45c5-b570-cd9de1c060d1 |    4607
 201920 | 1563635872209 | 9227166e-cda2-4909-b8ac-4168922a0128 |    2112

(3 rows)

Pro-tip: Partitioning on a unique id is great for data distribution, but doesn't give you much in the way of query flexibility.



来源:https://stackoverflow.com/questions/57106586/range-query-on-clustering-key

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!