问题
I know it can be done in traditional way, but if I were to use Cassandra DB, is there a easy/quick and agaile way to add csv to the DB as a set of key-value pairs ?
Ability to add a time-series data coming via CSV file is my prime requirement. I am ok to switch to any other database such as mongodb, rike, if it is conviniently doable there..
回答1:
Edit 2 Dec 02, 2017
Please use port 9042. Cassandra access has changed to CQL with default port as 9042, 9160 was default port for Thrift.
Edit 1
There is a better way to do this without any coding. Look at this answer https://stackoverflow.com/a/18110080/298455
However, if you want to pre-process or something custom you may want to so it yourself. here is a lengthy method:
Create a column family.
cqlsh> create keyspace mykeyspace with strategy_class = 'SimpleStrategy' and strategy_options:replication_factor = 1; cqlsh> use mykeyspace; cqlsh:mykeyspace> create table stackoverflow_question (id text primary key, name text, class text);Assuming your CSV is like this:
$ cat data.csv id,name,class 1,hello,10 2,world,20Write a simple Python code to read off of the file and dump into your CF. Something like this:
import csv from pycassa.pool import ConnectionPool from pycassa.columnfamily import ColumnFamily pool = ConnectionPool('mykeyspace', ['localhost:9160']) cf = ColumnFamily(pool, "stackoverflow_question") with open('data.csv', 'rb') as csvfile: reader = csv.DictReader(csvfile) for row in reader: print str(row) key = row['id'] del row['id'] cf.insert(key, row) pool.dispose()Execute this:
$ python loadcsv.py {'class': '10', 'id': '1', 'name': 'hello'} {'class': '20', 'id': '2', 'name': 'world'}Look the data:
cqlsh:mykeyspace> select * from stackoverflow_question; id | class | name ----+-------+------- 2 | 20 | world 1 | 10 | helloSee also:
a. Beware of DictReader
b. Look at Pycassa
c. Google for existing CSV loader to Cassandra. I guess there are.
d. There may be a simpler way using CQL driver, I do not know.
e. Use appropriate data type. I just wrapped them all into text. Not good.
HTH
I did not see the time-series requirement. Here is how you do for time series.
This is your data
$ cat data.csv id,1383799600,1383799601,1383799605,1383799621,1383799714 1,sensor-on,sensor-ready,flow-out,flow-interrupt,sensor-killAllCreate traditional wide row. (CQL suggests not to use COMPACT STORAGE, but this is just to get you going quickly.)
cqlsh:mykeyspace> create table timeseries (id text, timestamp text, data text, primary key (id, timestamp)) with compact storage;This the altered code:
import csv from pycassa.pool import ConnectionPool from pycassa.columnfamily import ColumnFamily pool = ConnectionPool('mykeyspace', ['localhost:9160']) cf = ColumnFamily(pool, "timeseries") with open('data.csv', 'rb') as csvfile: reader = csv.DictReader(csvfile) for row in reader: print str(row) key = row['id'] del row['id'] for (timestamp, data) in row.iteritems(): cf.insert(key, {timestamp: data}) pool.dispose()This is your timeseries
cqlsh:mykeyspace> select * from timeseries; id | timestamp | data ----+------------+---------------- 1 | 1383799600 | sensor-on 1 | 1383799601 | sensor-ready 1 | 1383799605 | flow-out 1 | 1383799621 | flow-interrupt 1 | 1383799714 | sensor-killAll
回答2:
Let's say your CSV looks like
'P38-Lightning', 'Lockheed', 1937, '.7'
cqlshto your DBAnd..
CREATE TABLE airplanes ( name text PRIMARY KEY, manufacturer ascii, year int, mach float );then...
COPY airplanes (name, manufacturer, year, mach) FROM '/classpath/temp.csv';
Refer: http://www.datastax.com/docs/1.1/references/cql/COPY
回答3:
Do Backup
./cqlsh -e"copy <keyspace>.<table> to '../data/table.csv';"
Use backup
./cqlsh -e"copy <keyspace>.<table> from '../data/table.csv';"
来源:https://stackoverflow.com/questions/19827690/how-can-i-add-csv-to-cassandra-db