generate UUID automatically for all records in Cassandra for an existing dataset

一个人想着一个人 提交于 2019-12-11 07:14:12

问题


I have an existing dataset with around 700000 records in a CSV format. I have imported that data file into apache Cassandra table. The problem is

primary key. How can I automatically generate (upsert) uuid into my primary key column for all of my records? I am using Cassandra 3.10.


回答1:


Unfortunately, if you're using the COPY command you don't really have any options for generating UUIDs on the fly for your rows. I think you really have two options, both of which involve doing things programmatically to one extent or another:

  1. Do some pre-processing on your CSV file to generate and add a UUID to each row, writing out a new file with that additional field and UUID value for each row. It should be pretty straightforward to process the file, line by line, and generate those values using a small Python script or something similar. Then you can use the COPY command like before to import the data into Cassandra.
  2. Since you're already going to be writing some code, skip using the COPY command altogether and just write the code in Python (or Java or your language of choice) to read the file, parse each CSV line into values, generate a UUID for that row, and then INSERT the data into Cassandra using the appropriate driver for the programming language you're using.

If you decide to go with option 2, you'll find a list of the DataStax drivers for Cassandra towards the bottom of this page, along with documentation for how to use them. Hope that helps!



来源:https://stackoverflow.com/questions/43506380/generate-uuid-automatically-for-all-records-in-cassandra-for-an-existing-dataset

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!