cassandra primary key column cannot be restricted

帅比萌擦擦* 提交于 2019-12-03 02:45:19

In Cassandra, you should design your data model to suit your queries. Therefore the proper way to support your second query (queries by doodle_id and schedule_id, but not necessarilly with user_id), is to create a new table to handle that specific query. This table will be pretty much the same, except the PRIMARY KEY will be slightly different:

CREATE TABLE votebydoodleandschedule (
    doodle_id uuid,
    user_id uuid,
    schedule_id uuid,
    vote int,
    PRIMARY KEY ((doodle_id), schedule_id, user_id)
);

Now this query will work:

SELECT * FROM votebydoodleandschedule 
WHERE doodle_id = c4778a27-f2ca-4c96-8669-15dcbd5d34a7 
AND schedule_id = c37df0ad-f61d-463e-bdcc-a97586bea633;

This gets you around having to specify ALLOW FILTERING. Relying on ALLOW FILTERING is never a good idea, and is certainly not something that you should do in a production cluster.

The clustering key is also used to find the columns within a given partition. With your model, you'll be able to query by:

  • doodle_id
  • doodle_id/user_id
  • doodle_id/user_id/schedule_id
  • user_id using ALLOW FILTERING
  • user_id/schedule_id using ALLOW FILTERING

You can see your primary key as a file path doodle_id#123/user_id#456/schedule_id#789 where all data is stored in the deepest folder (ie schedule_id#789). When you're querying you have to indicate the subfolder/subtree from where you start searching.

Your 2nd query doesn't work because of how columns are organized within partition. Cassandra can not get a continuous slice of columns in the partition because they are interleaved.

You should invert the primary key order (doodle_id, schedule_id, user_id) to be able to run your query.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!