Cassandra updates not working consistently

问题

I run the following code on my local (mac) machine and on a remote unix server.:

public void deleteValue(final String id, final String value) {
    log.info("Removing value " + value);
    final Collection<String> valuesBeforeRemoval = getValues(id);
    final MutationBatch m = keyspace.prepareMutationBatch();
    m.withRow(VALUES_CF, id).deleteColumn(value);
    try {
      m.execute();
    } catch (final ConnectionException e) {
      log.error("Unable to delete  location " + value, e);
    }
    final Collection<String> valuesAfterRemoval = getValues(id);
    if (valuesAfterRemoval.size()!=(valuesBeforeRemoval.size()-1)) {
      log.error("value " + value + " was supposed to be removed from list "  + valuesBeforeRemoval + " but it wasn't: " + valuesAfterRemoval);
    }
...
  }

protected Collection<String> getValues(final String id) {
  try {
    final OperationResult<ColumnList<String>> operationResult = keyspace
            .prepareQuery(VALUES_CF).getKey(id).execute();
    final ColumnList<String> result = operationResult.getResult();
    if (result.isEmpty()) {
      log.info("No  value found for id: " + id);
      return new ArrayList<String>();
    }
    return result.getColumnNames();
  } catch (final ConnectionException e) {
    log.error("Unable to retrieve session " + id, e);
  }
  return new ArrayList<String>();
}

Locally, that line is never executed, which makes sense:

log.error("value " + value + " was supposed to be removed from list "  + valuesBeforeRemoval + " but it wasn't: " + valuesAfterRemoval);

but that line is executed on my dev server:

[ERROR] [main] [n.o.w.s.d.SessionDaoCassandraImpl] [2013-03-08 13:12:24,801] [] - value 3 was supposed to be removed from list [3, 2, 1, 0, 7, 6, 5, 4, 9, 8] but it wasn't: [3, 2, 1, 0, 7, 6, 5, 4, 9, 8]

I am using com.netflix.astyanax
Both my local machine and the remote dev server connect to the very same cassandra instance.
Both my local machine and the remote dev server run the very same test creating a new row family, and adding 10 records before one is deleted.
When the error occurs on dev, log.error("Unable to delete location " + value, e); was not executed (i.e. running the deletion command didn't produce any exception).
I am 100% positive that no other code is affecting the content of the database while I am running the test on dev so this isn't some strange concurrency issue.

What could possibly explain that the deleteColumn(value) request runs without producing any error but still does not remove the column from the database?

ADDITIONAL INFO

Here is how I created the keyspace:

create keyspace sessiondata
    with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy'
    and strategy_options = {replication_factor:1};

Here is how I created the column family values, referenced as VALUES_CF in the code above:

create column family values
    with comparator = UTF8Type
;

Here is how the keyspace referenced in the java code above is defined:

final AstyanaxContext.Builder contextBuilder = getBuilder();
final AstyanaxContext<Keyspace> keyspaceContext = contextBuilder
        .forKeyspace(keyspaceName).buildKeyspace(
                ThriftFamilyFactory.getInstance());
keyspaceContext.start();
keyspace = keyspaceContext.getEntity();

where getBuilder is:

  private Builder getBuilder() {
    final AstyanaxConfigurationImpl conf = new AstyanaxConfigurationImpl()
    .setDiscoveryType(NodeDiscoveryType.NONE)
    .setRetryPolicy(new RunOnce());

    final ConnectionPoolConfigurationImpl poolConf = new ConnectionPoolConfigurationImpl("MyPool")
    .setPort(port)
    .setMaxConnsPerHost(1)
    .setSeeds(value);

    return new AstyanaxContext.Builder()
    .forCluster(cluster)
    .withAstyanaxConfiguration(conf)
    .withConnectionPoolConfiguration(poolConf)
    .withConnectionPoolMonitor(new CountingConnectionPoolMonitor());
  }

SECOND UPDATE

First, the issues are not solely related to deletes. I observe similar problems when updating records in the database, reading them, and not being able to read the updates I just wrote
Second, I created a test that does 100 times the following operations:
- write a row into cassandra
- update that row in cassandra
- read back that row from cassandra and check whether the row was indeed updated, and checking again regularly after delays if it wasn't
What I observe from that test is that:
- again, when I run that code locally, all 100 iterations pass right away (no retry ever needed)
- when I run that code on the remote server, some of the iterations pass, some fail. When they fail, no matter how large the delay (I wait up to 10 seconds), the test always fail.

At this point, I am really not sure how any cassandra setup could explain this behavior since I connect to the very same server for my tests and since the delays I insert are much larger than any additional latency I may need to run the test when connecting from my local machine.

The only relevant difference seems to be which machine the code is running on.

THIRD UPDATE

If in the test mentioned in the previous update, I insert a delay between the 2 writes, the code starts passing if the delay is >= 1,000 ms. A delay of, say, 100 ms doesn't help. I also modified the builder to set the default read and write consistencies to the most demanding: ALL, and that had no impact on the results of the test (still failing about half of the time unless delay between writes >1s):

final AstyanaxConfigurationImpl conf = new AstyanaxConfigurationImpl()
.setDiscoveryType(NodeDiscoveryType.NONE)
.setRetryPolicy(new RunOnce()).setDefaultReadConsistencyLevel(ConsistencyLevel.CL_ALL).setDefaultWriteConsistencyLevel(ConsistencyLevel.CL_ALL);

回答1:

To debug, try printing the full row instead of just the column names. When I say the full row I mean the column name, column value and the time stamp. A long shot is clocks are wrong on one of your test machines and this is throwing out your tests on the other.

Another thing to double check is that ip is indeed what you think it is, in both your application and cassandra. When you retrieve it print it between something, like println("-" + ip "-"). Before and after your try block for the execute in deleteSecureLocation do a get for only that column, not the entire row. I'm not too sure how to do that in astynax, on the cli it would be get[id][ip].

Something to keep in mind is that a delete won't fail even if there's nothing to delete. To cassandra it's a write, the only thing that will make it a delete is if on read it's the latest timestamped entry against that row/column name.

来源：https://stackoverflow.com/questions/15342621/cassandra-updates-not-working-consistently

标签

java

cassandra

astyanax