Finding distinct values of non Primary Key column in CQL Cassandra

寵の児 提交于 2019-12-05 05:39:04
Aaron

User - Undefined_variable - makes two good points:

  • In Cassandra, you need to build your data model to match your query patterns. This sometimes means duplicating your data into additional tables, to attain the desired level of query flexibility.
  • DISTINCT only works on partition keys.

So, one way to get this to work, would be to build a specific table to support that query:

CREATE TABLE users_by_lname (
    lname text,
    fname text,
    user_id int,
    PRIMARY KEY (lname, fname, user_id)
);

Now after I run your INSERTs to this new query table, this works:

aploetz@cqlsh:stackoverflow> SELECT DISTINCT lname FROm users_by_lname ;

 lname
-------
 smith
   doe

(2 rows)

Notes: In this table, all rows with the same partition key (lname) will be sorted by fname, as fname is a clustering key. I added user_id as an additional clustering key, just to ensure uniqueness.

There is no such functionality in cassandra. DISTINCT is possible on partition key only. You should Design Your data model based on your requirements. You have to process the data in application logic (spark may be useful)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!