how can I add csv to cassandra db?

问题

I know it can be done in traditional way, but if I were to use Cassandra DB, is there a easy/quick and agaile way to add csv to the DB as a set of key-value pairs ?

Ability to add a time-series data coming via CSV file is my prime requirement. I am ok to switch to any other database such as mongodb, rike, if it is conviniently doable there..

回答1:

Edit 2 Dec 02, 2017
Please use port 9042. Cassandra access has changed to CQL with default port as 9042, 9160 was default port for Thrift.

Edit 1
There is a better way to do this without any coding. Look at this answer https://stackoverflow.com/a/18110080/298455

However, if you want to pre-process or something custom you may want to so it yourself. here is a lengthy method:

Create a column family.

cqlsh> create keyspace mykeyspace 
with strategy_class = 'SimpleStrategy' 
and strategy_options:replication_factor = 1;

cqlsh> use mykeyspace;

cqlsh:mykeyspace> create table stackoverflow_question 
(id text primary key, name text, class text);

Assuming your CSV is like this:

$ cat data.csv 
id,name,class
1,hello,10
2,world,20

Write a simple Python code to read off of the file and dump into your CF. Something like this:

import csv 
from pycassa.pool import ConnectionPool
from pycassa.columnfamily import ColumnFamily

pool = ConnectionPool('mykeyspace', ['localhost:9160'])
cf = ColumnFamily(pool, "stackoverflow_question")

with open('data.csv', 'rb') as csvfile:
  reader = csv.DictReader(csvfile)
  for row in reader:
    print str(row)
    key = row['id']
    del row['id']
    cf.insert(key, row)

pool.dispose()

Execute this:

$ python loadcsv.py 
{'class': '10', 'id': '1', 'name': 'hello'}
{'class': '20', 'id': '2', 'name': 'world'}

Look the data:

cqlsh:mykeyspace> select * from stackoverflow_question;
 id | class | name
----+-------+-------
  2 |    20 | world
  1 |    10 | hello

See also:

a. Beware of DictReader
b. Look at Pycassa
c. Google for existing CSV loader to Cassandra. I guess there are.
d. There may be a simpler way using CQL driver, I do not know.
e. Use appropriate data type. I just wrapped them all into text. Not good.

HTH

I did not see the time-series requirement. Here is how you do for time series.

This is your data

$ cat data.csv
id,1383799600,1383799601,1383799605,1383799621,1383799714
1,sensor-on,sensor-ready,flow-out,flow-interrupt,sensor-killAll

Create traditional wide row. (CQL suggests not to use COMPACT STORAGE, but this is just to get you going quickly.)

cqlsh:mykeyspace> create table timeseries 
(id text, timestamp text, data text, primary key (id, timestamp)) 
with compact storage;

This the altered code:

import csv
from pycassa.pool import ConnectionPool
from pycassa.columnfamily import ColumnFamily

pool = ConnectionPool('mykeyspace', ['localhost:9160'])
cf = ColumnFamily(pool, "timeseries")

with open('data.csv', 'rb') as csvfile:
  reader = csv.DictReader(csvfile)
  for row in reader:
    print str(row)
    key = row['id']
    del row['id']
    for (timestamp, data) in row.iteritems():
      cf.insert(key, {timestamp: data})

pool.dispose()

This is your timeseries

cqlsh:mykeyspace> select * from timeseries;
 id | timestamp  | data
----+------------+----------------
  1 | 1383799600 |      sensor-on
  1 | 1383799601 |   sensor-ready
  1 | 1383799605 |       flow-out
  1 | 1383799621 | flow-interrupt
  1 | 1383799714 | sensor-killAll

回答2:

Let's say your CSV looks like

'P38-Lightning', 'Lockheed', 1937, '.7'

cqlsh to your DB

And..

CREATE TABLE airplanes (
 name text PRIMARY KEY,
 manufacturer ascii,
 year int,
 mach float
);

then...

COPY airplanes (name, manufacturer, year, mach) FROM '/classpath/temp.csv';

Refer: http://www.datastax.com/docs/1.1/references/cql/COPY

回答3:

Do Backup

./cqlsh -e"copy <keyspace>.<table> to '../data/table.csv';"

Use backup

./cqlsh -e"copy <keyspace>.<table> from '../data/table.csv';"

来源：https://stackoverflow.com/questions/19827690/how-can-i-add-csv-to-cassandra-db

标签

cassandra