Solved I was testing update on 3 nodes, and the time on one of those nodes was 1 second behind, so when update a row, the write time is always behind the timestamp, cassandra would not update the rows. I sync all nodes time, and the issue fixed.
Edit: I double checked the result, all insertions are succeed, partial updates failed. There's no error/exception messages
I have a cassandra cluster(Cassandra 2.0.13) which contains 5 nodes. Using python(2.6.6) cassandra driver(2.6.0c2)
for inserting data into database. my server systems are Centos6.X
The following code is how i connect to cassandra and get session. I provided at most 2 nodes ip addresses, and select the keyspace.
def get_cassandra_session():
"""creates cluster and gets the session base on key space"""
# be aware that session cannot be shared between threads/processes
# or it will raise OperationTimedOut Exception
if CLUSTER_HOST2:
cluster = cassandra.cluster.Cluster([CLUSTER_HOST1, CLUSTER_HOST2])
else:
# if only one address is available, we have to use older protocol version
cluster = cassandra.cluster.Cluster([CLUSTER_HOST1], protocol_version=1)
session = cluster.connect(KEY_SPACE)
return session
For each row, I have 17 columns and if the key does not exist in database, I will use session
insert key with the rest columns default values, and then update specific column's value.
def insert_initial_row(session, key):
session.execute(INITIAL_INSERTION_STATEMENT, tuple(INITIAL_COLUMNS_VALUES))
def update_columnX(session, key, column):
session.execute("INSERT INTO " + TABLE + "(" + KEY + "," + COLUMN_X + ") VALUES(%s, %s)", (key, column))
def has_found(session, key):
"""checks key is in database or not"""
query = "SELECT " + "*" + " FROM " + KEY_SPACE + "." + TABLE \
+ " WHERE " + KEY + " = " + "'" + key + "'"
# returns a list
row = session.execute(query)
return True if row else False
the following is how I invoke them:
for a_key in keys_set:
"""keys_set contains 100 no duplicate keys"""
if has_found(session, a_key):
update_columnX(session, a_key, "column x value")
else:
"""the key is not in db, initialize it with all default values, then update column x"""
insert_initial_row(session, a_key)
if has_found(sessin, a_key):
update_columnX(session, a_key, "column x value")
else:
logger.error("not initialized correctly...")
I was trying to insert 100 rows and update each row's columnX, but only partial of those 100 rows can be updated, the rest rows columnX are the default values.insert_initial_row
has been invoked and initialized default values for all 100 lines, but the update_columnX
does not. Event I change the consistency level to Quorum, it doesnt help at all. "not initialized correctly..." never printed out, and I added a print
line in update_columnX
and the line is printed 100 time, so it is invoked 100 times, but not all of them updated.
Any idea? Please help.
Thanks
If your session.execute writes were not successful (they did not meet the required consistency level), then the driver will raise one of the following exceptions:
- Unavailable - There were not enough live replicas to satisfy the requested consistency level, so the coordinator node immediately failed the request without forwarding it to any replicas.
- Timeout - Replicas did not respond to the coordinator before cassandra timeout.
- Write timeout - Replicas did not respond to the coordinator before the write timeout. Configured in cassandra.yaml. There is a similar timeout for reads, read and write timeouts are configured separately in the yaml.
- Operation timeout - Operation took longer than the specified client side timeout. Configure in your application code.
You can try tracing your queries and find out what exactly happened for each write. This will show you the coordinators and the replica nodes involved in the operation and how much time the request spent in each.
来源:https://stackoverflow.com/questions/31256238/cassandra-update-fails