run a bulk update query in cassandra on 1 column

天大地大妈咪最大 提交于 2019-12-24 08:16:08

问题


we have a scenario where a table in cassandra which has over million records and we want execute a bulk update on a column(basically set the column value to null in entire table).

is there a way to do so since below query won't work in CQL

UPDATE TABLE_NAME SET COL1=NULL WHERE PRIMARY_KEY IN(SELECT PRIMARY_KEY FROM TABLE_NAME );

P.S - the column is not a primary key or a cluster key.


回答1:


There has been a similar question the other days regarding Deleting a column in cassandra for a large dataset...I suggest also reading the section Dropping a column from the Alter table documentation.

One solution in this case might be dropping the column and re-adding it since

If you drop a column then re-add it, Cassandra does not restore the values written before the column was dropped. A subsequent SELECT on this column does not return the dropped data.

I would test this on a test system beforehand and I would check if the tombstones have been removed.




回答2:


There really isn't a way to do this through CQL short of iterating through each row and updating the value.

However, there might be a way to do this if you feel adventurous.

You could use COPY in cqlsh to output the data of the table to a file. With a tool like sed you can modify this text file to change the columns and then import that same file back into cassandra.

This solution is less than optimal and might not work for certain datasets, but it gets the job done.

Personally I would still prefer iterating over doing this.



来源:https://stackoverflow.com/questions/51635049/run-a-bulk-update-query-in-cassandra-on-1-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!