I have a cassandra table containing 3 million rows. Now I am trying to fetch all the rows and write them to several csv files. I know it is impossible to perform selec
as I know, one improvement in cassandra 2.0 'on the driver side' is automatic-paging. you can do something like this :
Statement stmt = new SimpleStatement("SELECT * FROM images LIMIT 3000000");
stmt.setFetchSize(100);
ResultSet rs = session.execute(stmt);
// Iterate over the ResultSet here
for more read Improvements on the driver side with Cassandra 2.0
you can find the driver here.
You could use Pig to read the data and store it into HDFS, then copy it out as a single file:
In Pig:
data = LOAD 'cql://your_ksp/your_table' USING CqlStorage();
STORE data INTO '/path/to/output' USING PigStorage(',');
From OS shell:
hadoop fs -copyToLocal hdfs://hadoop_url/path/to/output /path/to/local/storage
by default with select statement you can get only 100000 records.. so after that if you have to retrieve records you have to specify limit..
Select * from tablename LIMIT 10000000
(in your case 3 million then specify it)...