问题
I am having issues querying large volumes of data by a single day. I am looking for advice on creating an efficient table schema.
Table: eventlog
Columns: recordid (UUID), insertedtimestamp (timestamp), source (Text), event (Text)
If I simply do:
CREATE TABLE eventlog (
recordid uuid PRIMARY KEY,
insertedtimestamp timestamp,
source text,
event text
);
Then the below query will get overwhelmed by the volume of data, assuming today is 1/25.
select * from eventlog where insertedtimestamp > '2017-01-25';
The goal is to select all the records from a single day, knowing we need to be efficient in partitioning using tables with possibly millions of records. How would I design an efficient table schema (What partition key setup)? Thank you.
回答1:
Though you want to get all the record in a single day, you can use this schema
CREATE TABLE eventlog (
day int,
month int,
year int,
recordid uuid,
insertedtimestamp timestamp,
source text,
event text,
PRIMARY KEY((day,month,year),recordid)
);
So all of the data in a single day, will be in a single node. Now you can get data of a date say 2017-01-25 more efficiently with the below query
SELECT* FROM eventlog WHERE day = 25 and month = 1 and year = 2017
来源:https://stackoverflow.com/questions/41856542/cassandra-table-design-with-timestamp-and-large-dataset