Need to querying into a column (or collection) in Cassandra

僤鯓⒐⒋嵵緔 提交于 2020-01-06 20:19:45

问题


everyone :D

I'm working with Cassandra (Datastax version) and I have an issue.

I want to modeling a column who (always) gonna change.

That's very hard, because I can't just create a column family with 1,2,3,4..10 columns. Because, tomorrow probably can change.

I think in collections, but I got to query into these. I mean, I need query into this information every second.

Ex: With map:

<'col1':'val1','col2':'val2'> 

I need to query like this:

SELECT * FROM example WHERE 'col1' = 'val1' AND 'col2' = 'val2';

I don't know how to do this and is extremely necessary for what I want to do.

Even, I read that you can create a column (text) and implement a kind of format:

colum1 = 'val1\x01val2\x01'

But this doesn't resolve what I want to do, because I cant query on this fields (or don't know how)

Please, can u help me to model something like that?

I can't use a collection because (according to what I read) is slowly.

PD: sorry if my English is bad :( but thank you


回答1:


You can create a table like this

CREATE TABLE dynamic_columns
   partitionKey bigint,
   column_name text,
   column_value_text text,
   column_value_boolean boolean,
   column_value_bigint bigint,
   column_value_uuid uuid,
   column_value_timestamp timestamp,
   ....
   PRIMARY KEY((partitionKey), column_name)
);

The partitionKey is here to indicate on which machine(s) your data will be stored in the cluster

The clustering column column_name will store the label of your dynamic column. Then we have a list of normal columns, one for each data type (bigint, uuid, timestamp ....)

Let's take and example:

INSERT INTO dynamic_columns(partitionKey, column_name, column_value_text)
VALUES(1, 'firstname', 'John DOE');

INSERT INTO dynamic_columns(partitionKey, column_name, column_value_boolean)
VALUES(1, 'validity_state', true);

INSERT INTO dynamic_columns(partitionKey, column_name, column_value_timestamp)
VALUES(1, 'validity_date', '2016-03-13 12:00:00+0000');

So the idea is that we define a list of column_value, one for each existing type in Cassandra but we only insert data into the appropriate type column, like the examples above.

For querying, you'll need to create an index on each type column. Example:

CREATE INDEX ON dynamic_columns(column_value_boolean);
CREATE INDEX ON dynamic_columns(column_value_text);
CREATE INDEX ON dynamic_columns(column_value_boolean);
....

If you can switch to Cassandra 3.4, there is a better secondary index implementation called SASI, here the syntax for creating index:

// All data types EXCEPT text
CREATE CUSTOM INDEX ON types(column_value_boolean) 
USING 'org.apache.cassandra.index.sasi.SASIIndex' 
WITH OPTIONS = {'mode': 'SPARSE'};

// Text data type
CREATE CUSTOM INDEX ON types(column_value_text) 
USING 'org.apache.cassandra.index.sasi.SASIIndex' 
WITH OPTIONS = {
    'mode': 'PREFIX', 
    'analyzer_class' : 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
    'case_sensitive': 'false'
};

Then you can query your columns easily:

//Give me col1 where value = 'val1'
SELECT * FROM dynamic_columns 
WHERE partitionKey=1 
AND column_name='col1'
AND column_value_text='val1';

//Give me 'validity_state' = true
SELECT * FROM dynamic_columns 
WHERE partitionKey=1 
AND column_name='validity_state'
AND column_value_boolean=true;

Remark: you should always provide the partitionKey value in your SELECT otherwise Cassandra will perform a full cluster scan in worst case and kill your performance. With the SASI index since Cassandra 3.4, this problem is less critical but it is still strongly recommended to provide partitionKey when using secondary index

For more information on the importance of partition key, read this: http://www.planetcassandra.org/blog/the-most-important-thing-to-know-in-cassandra-data-modeling-the-primary-key/



来源:https://stackoverflow.com/questions/35965182/need-to-querying-into-a-column-or-collection-in-cassandra

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!