We know that it is easy to create auto increment IDs in SQL databases, is there a good solution for it in Cassandra? The IDs should be for key or column name.
问题:
回答1:
Creating a global sequential sequence of number does not really make any sense in a distributed system. Use UUIDs.
(Because you would have to make all participants agree and accept the evolution of the sequence -- under a naive implementation)
回答2:
How about the following, using Cassandra's Lightweight transactions
1 - Create IDs table:
CREATE TABLE ids ( id_name varchar, next_id int, PRIMARY KEY (id_name) )
2 - Insert every id you'd like to use a global sequence with
For example:
INSERT INTO ids (id_name, next_id) VALUES ('person_id', 1)
3 - Then, when inserting to a table where you'd like to use an auto-incremented key, do the following:
3.1 - Get the next_id from the ids table:
SELECT next_id FROM ids WHERE id_name = 'person_id'
Let's say the result is next_id = 1
3.2 - Increment next_id, the following way:
UPDATE ids SET next_id = 2 WHERE id_name = 'person_id' IF next_id = 1
The result should look like this:
[{[applied]: True}]
If it was updated successfully, OR
[{[applied]: False, next_id: 2}]
If someone else has already updated it.
So, if you got True, use id '1' - it is yours. Otherwise, increment next_id (or just use the returned next_id) and repeat the process.
回答3:
There is no good solution.
- Create a column with a number, increase the number and save it to all replicas together with a temporary id, read all replicas and check if the temporary id is "yours", if not do it again.. not a great solution and will not scale.
or
- Build your own id service where you fetch your next id. This service will only be run in a single instance and will be a non scaling scary factor.
As soon as anything goes beyond a single instance the sequencing of id's gets complicated, at least if you want it to scale. That includes relational databases.
回答4:
there is a counter datatype which can be used. Consider the below example.
CREATE KEYSPACE counterks WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 };
Create a table for the counter column.
CREATE TABLE counterks.page_view_counts (counter_value counter, url_name varchar, page_name varchar, PRIMARY KEY (url_name, page_name) );
Load data into the counter column.
UPDATE counterks.page_view_counts SET counter_value = counter_value + 1 WHERE url_name='www.datastax.com' AND page_name='home';
Take a look at the counter value.
SELECT * FROM counterks.page_view_counts;
Output is:
url_name | page_name | counter_value ------------------+-----------+--------------- www.datastax.com | home | 1
Increase the value of the counter.
UPDATE counterks.page_view_counts SET counter_value = counter_value + 2 WHERE url_name='www.datastax.com' AND page_name='home';
Take a look at the counter value.
url_name | page_name | counter_value ------------------+-----------+--------------- www.datastax.com | home | 3
Refer this for more details: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_counter_t.html
回答5:
This question is pretty old but I'd like to complete it with an other solution.
Any solution that relies on nodes synchronization is unreasonable. It's pretty sure to break either by blocking IDs generation or by creating duplicate IDs.
MySQL way
You can reproduce the way it's done with the mysql master-master replication with the auto_increment_increment
and auto_increment_offset
parameters.
To reproduce it, you need to know the number of nodes or the max number of expected nodes and you need to create a (non-cassandra) counter (a file per example) on each node.
Each time you want to generate a new number, you find the current value, add the increment and save it. If it doesn't exist yet, it's the offset.
So for 10 nodes, you would have an increment of 10 and an offset of 1 for the first node, 2 for the second node, etc. Node 1 would create the IDs 1, 11, 21. Node 2 would create the IDs 2, 21, 22.
If you want your IDs to be (approximatively) ordered between nodes, you need to maintain a shared counter and make sure each generated ID is higher than the shared counter. That way, unless your nodes/datacenters are out of sync for a long time, you shouldn't notice much difference.
Prefixing
You can do basically the same thing by prefixing the ID (if it's an acceptable solution) with the node number (or name). And you don't have to known the number of nodes. Node 1 would create 1_1, 1_2, 1_3. Node 2 would create 2_1, 2_2, 2_3.
回答6:
Edit: This solution is not proper. See the first comment.
My solution:
1 - Create IDs table:
CREATE TABLE ids ( id_name varchar, next_id counter, PRIMARY KEY (id_name) )
2 - When inserting to a table where you'd like to use an auto-incremented key, do the following:
2.1 - Increment counter (it will be created if not exists), using the highest consistency level
UPDATE ids SET next_id = next_id + 1 WHERE id_name = $AUTO_INCREMENTED_ID USING CONSISTENCY ALL
2.2 - Get the new id value:
SELECT next_id FROM ids WHERE id_name = $AUTO_INCREMENTED_ID
2.3 - Insert the value with the auto-incremented id
INSERT INTO some_table ($AUTO_INCREMENTED_ID, ...) VALUES ($RESULT_FROM_PREVIOUS_QUERY, ...)
Words starting with '$' in my answer are self-explanatory (I hope) placeholders...
Of course this is not a recommended method. Use it only if you have to.
回答7:
Do they really need to be sequential, or do you just need counting numbers that are much smaller than a UUID that are easily entered by a person?
If you really need sequential numbers, then you will need to do one of the following.
Have a table in cassandra where the key/id is a field for the generator, and the value is a number... do conditional updates in a loop until you successfully increment the count. (bad idea)
Have a generator service that will give you the next number. This can only run on a single system and be a single point of failure, but depending on your needs this might be best.
Alternatively... Similar to the first, but get batches of 100 or so numbers at a time, and deal those out inside your process/thread... This will have less contention, but no guarantee of sequential order, only uniqueness... If you only want shorter numbers that are unique for display, this may be your best bet.
回答8:
I think IMHO expecting Cassandra to provide a auto-increment itself is wrong
Cassandra is a decentralised database, hence for it to provide a auto-incrmenting field is taxing and defeats the purpose, because this value has to be then maintained in a central place
Hence, don't make any solution which DB based to get a auto-incrementing number
Instance create a ID code or service in your app, which can keep generating random unique IDs and use that to save in your Database, this way the objective & benefit of Cassandra will not be defeated