Versioning in cassandra

烈酒焚心 提交于 2019-12-13 02:50:16

问题


I have a requirement of versioning to be done using cassandra.

Following is my column family definition

create table file_details(id text primary key, fname text, version int, mimetype text);

I have a secondary index created on fname column.

Whenever I do an insert for the same 'fname', the version should be incremented. And when I retrieve a row with fname it should return me the latest version row.

Please suggest what approach needs to be taken.


回答1:


If it's not possible to relax the requirement of versions increasing by 1, one option is to use counters.

Create a table for the data:

create table file_details(id text primary key, fname text, mimetype text);

and a separate table for the version:

create table file_details_version(id text primary key, version counter);

This needs to be a separate table because tables can either contain all counters or no counters.

Then for an update you can do:

insert into file_details(id, fname, mimetype) values ('id1', 'fname', 'mime');
update file_details_version set version = version + 1 where id = 'id1';

Then a read from file_details will always return the latest, and you can find the latest version number from file_details_version.

There are numerous problems with this though. You can't do atomic batches with counters, so the two updates are not atomic - some failure scenarios could lead to only the insert into file_details being persisted. Also, there is no read isolation, so if you read during an update you may get inconsistent data between the two tables, Finally, counter updates in Cassandra are not tolerant of failures, so if a failure happens during a counter update you may double count i.e. increment the version too much.

I think all solutions involving counters will hit these issues. You could avoid counters by generating a unique ID (e.g. a large random number) for each update and inserting that into a row in a separate table. The version would then be the number of IDs in the row. Now you can do atomic updates, and the counts would be tolerant to failures. However, the read time would be O(number of updates) and reads would still not be isolated.



来源:https://stackoverflow.com/questions/18575560/versioning-in-cassandra

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!