cassandra/hadoop/pig design for loading and processing data

倾然丶 夕夏残阳落幕 提交于 2019-12-24 06:52:15

问题


I have a setup of Hadoop,Cassandra, Pig, Mysql

My goal is to read 1 month data from cassandra process it and put result to mysql periodically.

What is the best practice to do.?
Is it i need to load all the data and filter in pig for 1 month or filter while loading from cassandra using pig/cql(using CqlStorage).

Here the problem is, if i need to filter while loading from cassandra pig has a bug of having where clause on cql(https://issues.apache.org/jira/browse/CASSANDRA-6151).

or

problem with another solution of loading all data and filter through pig is, the data is too large nearly 200 million records, is it a better solution to load all data, if so what about the performance and time taken by pig script to run.

来源:https://stackoverflow.com/questions/21698582/cassandra-hadoop-pig-design-for-loading-and-processing-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!