how can I add csv to cassandra db?

只谈情不闲聊 提交于 2020-05-15 08:57:13

问题


I know it can be done in traditional way, but if I were to use Cassandra DB, is there a easy/quick and agaile way to add csv to the DB as a set of key-value pairs ?

Ability to add a time-series data coming via CSV file is my prime requirement. I am ok to switch to any other database such as mongodb, rike, if it is conviniently doable there..


回答1:


Edit 2 Dec 02, 2017
Please use port 9042. Cassandra access has changed to CQL with default port as 9042, 9160 was default port for Thrift.

Edit 1
There is a better way to do this without any coding. Look at this answer https://stackoverflow.com/a/18110080/298455

However, if you want to pre-process or something custom you may want to so it yourself. here is a lengthy method:


  1. Create a column family.

    cqlsh> create keyspace mykeyspace 
    with strategy_class = 'SimpleStrategy' 
    and strategy_options:replication_factor = 1;
    
    cqlsh> use mykeyspace;
    
    cqlsh:mykeyspace> create table stackoverflow_question 
    (id text primary key, name text, class text);
    

    Assuming your CSV is like this:

    $ cat data.csv 
    id,name,class
    1,hello,10
    2,world,20
    
  2. Write a simple Python code to read off of the file and dump into your CF. Something like this:

    import csv 
    from pycassa.pool import ConnectionPool
    from pycassa.columnfamily import ColumnFamily
    
    pool = ConnectionPool('mykeyspace', ['localhost:9160'])
    cf = ColumnFamily(pool, "stackoverflow_question")
    
    with open('data.csv', 'rb') as csvfile:
      reader = csv.DictReader(csvfile)
      for row in reader:
        print str(row)
        key = row['id']
        del row['id']
        cf.insert(key, row)
    
    pool.dispose()
    
  3. Execute this:

    $ python loadcsv.py 
    {'class': '10', 'id': '1', 'name': 'hello'}
    {'class': '20', 'id': '2', 'name': 'world'}
    
  4. Look the data:

    cqlsh:mykeyspace> select * from stackoverflow_question;
     id | class | name
    ----+-------+-------
      2 |    20 | world
      1 |    10 | hello
    
  5. See also:

    a. Beware of DictReader
    b. Look at Pycassa
    c. Google for existing CSV loader to Cassandra. I guess there are.
    d. There may be a simpler way using CQL driver, I do not know.
    e. Use appropriate data type. I just wrapped them all into text. Not good.

HTH


I did not see the time-series requirement. Here is how you do for time series.

  1. This is your data

    $ cat data.csv
    id,1383799600,1383799601,1383799605,1383799621,1383799714
    1,sensor-on,sensor-ready,flow-out,flow-interrupt,sensor-killAll
    
  2. Create traditional wide row. (CQL suggests not to use COMPACT STORAGE, but this is just to get you going quickly.)

    cqlsh:mykeyspace> create table timeseries 
    (id text, timestamp text, data text, primary key (id, timestamp)) 
    with compact storage;
    
  3. This the altered code:

    import csv
    from pycassa.pool import ConnectionPool
    from pycassa.columnfamily import ColumnFamily
    
    pool = ConnectionPool('mykeyspace', ['localhost:9160'])
    cf = ColumnFamily(pool, "timeseries")
    
    with open('data.csv', 'rb') as csvfile:
      reader = csv.DictReader(csvfile)
      for row in reader:
        print str(row)
        key = row['id']
        del row['id']
        for (timestamp, data) in row.iteritems():
          cf.insert(key, {timestamp: data})
    
    pool.dispose()
    
  4. This is your timeseries

    cqlsh:mykeyspace> select * from timeseries;
     id | timestamp  | data
    ----+------------+----------------
      1 | 1383799600 |      sensor-on
      1 | 1383799601 |   sensor-ready
      1 | 1383799605 |       flow-out
      1 | 1383799621 | flow-interrupt
      1 | 1383799714 | sensor-killAll
    



回答2:


Let's say your CSV looks like

'P38-Lightning', 'Lockheed', 1937, '.7'
  1. cqlsh to your DB

  2. And..

    CREATE TABLE airplanes (
     name text PRIMARY KEY,
     manufacturer ascii,
     year int,
     mach float
    );
    
  3. then...

    COPY airplanes (name, manufacturer, year, mach) FROM '/classpath/temp.csv';
    

Refer: http://www.datastax.com/docs/1.1/references/cql/COPY




回答3:


Do Backup

./cqlsh -e"copy <keyspace>.<table> to '../data/table.csv';"

Use backup

./cqlsh -e"copy <keyspace>.<table> from '../data/table.csv';"


来源:https://stackoverflow.com/questions/19827690/how-can-i-add-csv-to-cassandra-db

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!