问题
I have a setup of Hadoop,Cassandra, Pig, Mysql
My goal is to read 1 month data from cassandra process it and put result to mysql periodically.
What is the best practice to do.?
Is it i need to load all the data and filter in pig for 1 month or filter while loading from cassandra using pig/cql(using CqlStorage).
Here the problem is, if i need to filter while loading from cassandra pig has a bug of having where clause on cql(https://issues.apache.org/jira/browse/CASSANDRA-6151).
or
problem with another solution of loading all data and filter through pig is, the data is too large nearly 200 million records, is it a better solution to load all data, if so what about the performance and time taken by pig script to run.
来源:https://stackoverflow.com/questions/21698582/cassandra-hadoop-pig-design-for-loading-and-processing-data