问题
Is it possible to create an index on a UUID/TIMEUUID column in Cassandra? I'm testing out a model design which would have an index on a UUID column, but queries on that column always return 0 rows found.
I have a table like this:
create table some_data (site_id int, user_id int, run_id uuid, value int, primary key((site_id, user_id), run_id));
I create an index with this command:
create index idx on some_data (run_id) ;
No errors are thrown by CQL when I create this index.
I have a small bit of test data in the table:
site_id | user_id | run_id | value
---------+---------+--------------------------------------+-----------------
1 | 1 | 9e118af0-ac92-11e4-81ae-8d1bc921f26d | 3
However, when I run the query:
select * from some_data where run_id = 9e118af0-ac92-11e4-81ae-8d1bc921f26d
CQLSH just returns: (0 rows)
If I use an int
for the run_id
then the index behaves as expected.
回答1:
Yes, you can create a secondary index on a UUID. The real question is "should you?"
In any case, I followed your steps, and got it to work.
Connected to Test Cluster at 192.168.23.129:9042.
[cqlsh 5.0.1 | Cassandra 2.1.2 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
aploetz@cqlsh> use stackoverflow ;
aploetz@cqlsh:stackoverflow> create table some_data (site_id int, user_id int, run_id uuid, value int, primary key((site_id, user_id), run_id));
aploetz@cqlsh:stackoverflow> create index idx on some_data (run_id) ;
aploetz@cqlsh:stackoverflow> INSERT INTO some_data (site_id, user_id, run_id, value) VALUES (1,1,9e118af0-ac92-11e4-81ae-8d1bc921f26d,3);
aploetz@cqlsh:stackoverflow> select * from usr_rec3 where run_id = 9e118af0-ac92-11e4-81ae-8d1bc921f26d;
code=2200 [Invalid query] message="unconfigured columnfamily usr_rec3"
aploetz@cqlsh:stackoverflow> select * from some_data where run_id = 9e118af0-ac92-11e4-81ae-8d1bc921f26d;
site_id | user_id | run_id | value
---------+---------+--------------------------------------+-------
1 | 1 | 9e118af0-ac92-11e4-81ae-8d1bc921f26d | 3
(1 rows)
Notice though, that when I ran this command, it failed:
select * from usr_rec3 where run_id = 9e118af0-ac92-11e4-81ae-8d1bc921f26d
Are you sure that you didn't mean to select from some_data
instead?
Also, creating secondary indexes on high-cardinality columns (like a UUID) is generally not a good idea. If you need to query by run_id
, then you should revisit your data model and come up with an appropriate query table to serve that.
Clarification:
- Using secondary indexes in general is not considered good practice. In the new book Cassandra High Availability, Robbie Strickland identifies their use as an anti-pattern, due to poor performance.
- Just because a column is of the UUID data type doesn't necessarily make it high-cardinality. That's more of a data model question for you. But knowing the nature of UUIDs and their underlying purpose toward being unique, is setting off red flags.
- Put these two points together, and there isn't anything about creating an index on a UUID that sounds appealing to me. If it were my cluster, and (more importantly) I had to support it later, I wouldn't do it.
来源:https://stackoverflow.com/questions/28327945/can-an-index-be-created-on-a-uuid-column