How to get Last 6 Month data comparing with timestamp column using cassandra query?

☆樱花仙子☆ 提交于 2019-12-18 09:48:34

问题


How to get Last 6 Month data comparing with timestamp column using cassandra query? I need to get all account statement which belongs to last 3/6 months comparing with updatedTime(TimeStamp column) and CurrentTime. For example in SQL we are using DateAdd() function tor this to get. i dont know how to proceed this in cassandra. If anyone know,reply.Thanks in Advance.


回答1:


Cassandra 2.2 and later allows users to define functions (UDT) that can be applied to data stored in a table as part of a query result.

You can create your own method if you use Cassandra 2.2 and later UDF

CREATE FUNCTION monthadd(date timestamp, month int)
    CALLED ON NULL INPUT
    RETURNS timestamp
    LANGUAGE java
    AS $$java.util.Calendar c = java.util.Calendar.getInstance();c.setTime(date);c.add(java.util.Calendar.MONTH, month);return c.getTime();$$

This method receive two parameter

  • date timestamp: The date from you want add or subtract number of month
  • month int: Number of month you want to or add(+) subtract(-) from date

Return the date timestamp

Here is how you can use this :

SELECT * FROM ttest WHERE id = 1 AND updated_time >= monthAdd(dateof(now()), -6) ;

Here monthAdd method subtract 1 mont from the current timestamp, So this query will data of last month

Note : By default User-defined-functions are disabled in cassandra.yaml - set enable_user_defined_functions=true to enable if you are aware of the security risks




回答2:


In cassandra you have to build the queries upfront.

Also be aware that you will probably have to bucket the data depending on the number of accounts that you have within some period of time.

If your whole database doesn't contain more than let's say 100k entries you are fine with just defining a single generic partition let's say with name 'all'. But usually people have a lot of data that simply goes into bucket that carries a name of month, week, hour. This depends on the number of inserts you get.

The reason for creating buckets is that every node can find a partition by it's partition key. This is the first part of the primary key definition. Then on every node the data is sorted by the second information that you pass in to the primary key. Having the data sorted enables you to "scan" over them i.e. you will be able to retrieve them by giving timestamp parameter.

Let's say you want to retrieve accounts from the last 6 months and that you are saving all the accounts from one month in the same bucket.

The schema might be something on the lines of:

create table accounts {
    month text,
    created_time timestamp,
    account text,
    PRIMARY KEY (month, created_time)
}

Usually you will do this at the application level, merging queries is an anti pattern but is o.k. for smaller amount of queries:

select account  
from accounts 
where month = '201701';

Output:

'201702'
'201703'

and so on.

If you have something really simple with let's say expected 100 000 entries then you could use the above schema and just do something like:

create table accounts {
    bucket text,
    created_time timestamp,
    account text,
    PRIMARY KEY (bucket, created_time)
}

select account 
from accounts 
where bucket = 'some_predefined_name' 
  and created_time > '2016-10-04 00:00:00'

Once more as a wrap-up, with cassandra you always have to prepare the structures for the access pattern you are going to use.



来源:https://stackoverflow.com/questions/43198775/how-to-get-last-6-month-data-comparing-with-timestamp-column-using-cassandra-que

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!