问题
I am working on a system in which I need to store Avro Schemas in Cassandra database. So in Cassandra we will be storing something like this
SchemaId AvroSchema
1 some schema
2 another schema
Now suppose as soon as I insert another row in the above table in Cassandra and now the table is like this -
SchemaId AvroSchema
1 some schema
2 another schema
3 another new schema
As soon as I insert a new row in the above table - I need to tell my Java program to go and pull the new schema id and corresponding schema..
What is the right way to solve these kind of problem?
I know, one way is to have polling every few minutes, let's say every 5 minutes we will go and pull the data from the above table but this is not the right way to solve this problem as every 5 minutes, I am doing a pull whether or not there are any new schemas..
But is there any other solution apart from this?
Can we use Apache Zookeeper? Or Zookeeper is not fit for this problem? Or any other solution?
I am running Apache Cassandra 1.2.9
回答1:
Some solutions:
- With database triggers: Cassandra 2.0 has some trigger support but it looks like it is not final and might change a little in 2.1 according to this article: http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-0-prototype-triggers-support. Triggers are a common solution.
- You brought up polling but that is not always a bad option. Especially if you have something that marks that row as not being pulled yet, so you can just pull the new rows out of Cassandra. Pulling once every 5 minutes is nothing load wise for Cassandra or any database if the query is not a heavy cost. This option might not be good if new rows get inserted on a very infrequent basis.
Zookeeper would not be a perfect solution, see this quote:
Because watches are one time triggers and there is latency between getting the event and sending a new request to get a watch you cannot reliably see every change that happens to a node in ZooKeeper. Be prepared to handle the case where the znode changes multiple times between getting the event and setting the watch again. (You may not care, but at least realize it may happen.)
Quote sourced from: http://zookeeper.apache.org/doc/r3.4.2/zookeeperProgrammers.html#sc_WatchRememberThese
回答2:
Cassandra 3.0
You can use this and it will get you everything in the insert as a json object.
public class HelloWorld implements ITrigger
{
private static final Logger logger = LoggerFactory.getLogger(HelloWorld.class);
public Collection<Mutation> augment(Partition partition)
{
String tableName = partition.metadata().cfName;
logger.info("Table: " + tableName);
JSONObject obj = new JSONObject();
obj.put("message_id", partition.metadata().getKeyValidator().getString(partition.partitionKey().getKey()));
try {
UnfilteredRowIterator it = partition.unfilteredIterator();
while (it.hasNext()) {
Unfiltered un = it.next();
Clustering clt = (Clustering) un.clustering();
Iterator<Cell> cells = partition.getRow(clt).cells().iterator();
Iterator<ColumnDefinition> columns = partition.getRow(clt).columns().iterator();
while(columns.hasNext()){
ColumnDefinition columnDef = columns.next();
Cell cell = cells.next();
String data = new String(cell.value().array()); // If cell type is text
obj.put(columnDef.toString(), data);
}
}
} catch (Exception e) {
}
logger.debug(obj.toString());
return Collections.emptyList();
}
}
来源:https://stackoverflow.com/questions/19573002/pull-from-cassandra-database-whenever-any-new-rows-or-any-new-update-is-there