问题
I have an existing dataset with around 700000 records in a CSV format. I have imported that data file into apache Cassandra table. The problem is
primary key. How can I automatically generate (upsert) uuid into my primary key column for all of my records? I am using Cassandra 3.10.
回答1:
Unfortunately, if you're using the COPY
command you don't really have any options for generating UUIDs
on the fly for your rows. I think you really have two options, both of which involve doing things programmatically to one extent or another:
- Do some pre-processing on your CSV file to generate and add a
UUID
to each row, writing out a new file with that additional field andUUID
value for each row. It should be pretty straightforward to process the file, line by line, and generate those values using a small Python script or something similar. Then you can use theCOPY
command like before to import the data into Cassandra. - Since you're already going to be writing some code, skip using the
COPY
command altogether and just write the code in Python (or Java or your language of choice) to read the file, parse each CSV line into values, generate a UUID for that row, and thenINSERT
the data into Cassandra using the appropriate driver for the programming language you're using.
If you decide to go with option 2, you'll find a list of the DataStax drivers for Cassandra towards the bottom of this page, along with documentation for how to use them. Hope that helps!
来源:https://stackoverflow.com/questions/43506380/generate-uuid-automatically-for-all-records-in-cassandra-for-an-existing-dataset