cassandra/hadoop/pig design for loading and processing data

问题

I have a setup of Hadoop,Cassandra, Pig, Mysql

My goal is to read 1 month data from cassandra process it and put result to mysql periodically.

What is the best practice to do.?
Is it i need to load all the data and filter in pig for 1 month or filter while loading from cassandra using pig/cql(using CqlStorage).

Here the problem is, if i need to filter while loading from cassandra pig has a bug of having where clause on cql(https://issues.apache.org/jira/browse/CASSANDRA-6151).

problem with another solution of loading all data and filter through pig is, the data is too large nearly 200 million records, is it a better solution to load all data, if so what about the performance and time taken by pig script to run.

来源：https://stackoverflow.com/questions/21698582/cassandra-hadoop-pig-design-for-loading-and-processing-data

标签

Hadoop

cassandra

apache-pig

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!