How to handle AllServersUnavailable Exception

问题

I wanted to do a simple write operation to a Cassandra instance (v1.1.10) on a single node. I just wanted to see how it handles constant writes and if it can keep up with the write speed.

pool = ConnectionPool('testdb')
test_cf = ColumnFamily(pool,'test')
test2_cf = ColumnFamily(pool,'test2')
test3_cf = ColumnFamily(pool,'test3')
test_batch = test_cf.batch(queue_size=1000)
test2_batch = test2_cf.batch(queue_size=1000)
test3_batch = test3_cf.batch(queue_size=1000)

chars=string.ascii_uppercase
counter = 0
while True:
    counter += 1
    uid = uuid.uuid1()
    junk = ''.join(random.choice(chars) for x in range(50))
    test_batch.insert(uid, {'junk':junk})
    test2_batch.insert(uid, {'junk':junk})
    test3_batch.insert(uid, {'junk':junk})
    sys.stdout.write(str(counter)+'\n')

pool.dispose()

The code keeps crushing after a long write (when the counter is around 10M+) with the following message

pycassa.pool.AllServersUnavailable: An attempt was made to connect to each of the servers twice, but none of the attempts succeeded. The last failure was timeout: timed out

I set the queue_size=100 which didn't help. Also I fired up the cqlsh -3 console to truncate the table after the script crashed and got the following error:

Unable to complete request: one or more nodes were unavailable.

Tailing /var/log/cassandra/system.log gives no error sign but INFO on Compaction, FlushWriter and so on. What am I doing wrong?

回答1:

I've had this problem too - as @tyler-hobbs suggested in his comment the node is likely overloaded (it was for me). A simple fix that I've used is to back-off and let the node catch up. I've rewritten your loop above to catch the error, sleep a while and try again. I've run this against a single node cluster and it works a treat - pausing (for a minute) and backing off periodically (no more than 5 times in a row). No data is missed using this script unless the error throws five times in a row (in which case you probably want to fail hard rather than return to the loop).

while True:
  counter += 1
  uid = uuid.uuid1()
  junk = ''.join(random.choice(chars) for x in range(50))
  tryCount = 5 # 5 is probably unnecessarily high
  while tryCount > 0:
    try:
      test_batch.insert(uid, {'junk':junk})
      test2_batch.insert(uid, {'junk':junk})
      test3_batch.insert(uid, {'junk':junk})
      tryCount = -1
    except pycassa.pool.AllServersUnavailable as e:
      print "Trying to insert [" + str(uid) + "] but got error " + str(e) + " (attempt " + str(tryCount) + "). Backing off for a minute to let Cassandra settle down"
      time.sleep(60) # A delay of 60s is probably unnecessarily high
      tryCount = tryCount - 1
  sys.stdout.write(str(counter)+'\n')

I've added a complete gist here

来源：https://stackoverflow.com/questions/15906550/how-to-handle-allserversunavailable-exception

标签

python

cassandra

pycassa