Titan node does not come up

倾然丶 夕夏残阳落幕 提交于 2019-12-24 12:54:02

问题


I have a small Titan 0.5.0 cluster with 8 nodes. Every node runs Titan in Rexster 2.5.0 and Cassandra. They all are configured the same. Unfortunately nearly all the time one of them does not manage to start.
In most cases this is one of the seed nodes.

Using cassandra as storage backend I get the following in the Rexster/Titan log.

WARN  com.tinkerpop.rexster.config.GraphConfigurationContainer - Could 
  not open global configuration com.thinkaurelius.titan.core.TitanException:
  Could not open global configuration
 at com.thinkaurelius.titan.diskstorage.Backend.
   getStandaloneGlobalConfiguration(Backend.java: 405)
...
Caused by: com.thinkaurelius.titan.diskstorage.TemporaryBackendException: 
  Temporary failure in storage backend
 at com.thinkaurelius.titan.diskstorage.cassandra.astyanax.
   AstyanaxStoreManager.ensureColumnFamilyExists(AstyanaxStoreManager.java:446)
...
Caused by: com.netflix.astyanax.connectionpool.exceptions.BadRequestException: 
  BadRequestException: [host=192.168.0.10(192.168.0.10):9160, latency=496(496),
  attempts=1] InvalidRequestException(why:Cannot add already existing
  column family "system_properties" to keyspace "titan")
 at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(
   ThriftConverter.java:159)

Rexster does fail to start and thus did not load the graph. However, the Cassandra node Rexster failed to connect to seems to be fine: nodetool lists the node as part of the ring. If I fire requests against the remaining Rexster instances everything seems to work.

I wiped all data before starting the nodes.

I switched to cassandrathrift resulting in a similar exception (same TitanException caused by PermanentBackendException caused by TimeoutException). The storage timeout in Rexster is 30s. This may be too low since I start all nodes simultaneously at the moment, but does not explain the issues with cassandra.

What is going wrong here?

edit:

I was misusing Titan. To not have to deal with index creation on startup - which happens quite often in my case - I created the index in the Rexster extension. I think this code got invoked multiple times: When I started multiple nodes simultaneously it seems some of them tried to create the index.

Question: Is there any way the extension can create the indices safely? I created a separate thread for this: What are the methods to create indices?

I increased the storage timeout to 60s and retried the procedure after removing the index creation from code. I still startup all nodes simultaneously. Again one Rexstitan node (seed node #2) fails to start.

The Cassandra log indeed contains an exception

java.lang.IllegalArgumentException: Unknown keyspace/cf pair (titan.txlog)
    at org.apache.cassandra.db.Keyspace.getColumnFamilyStore(Keyspace.java:166)
    at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:326)
    at org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:65)
    at org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47)
    at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

which I can see in both seed nodes. While the Rexster on one seed node does not seem to care the other Rexster instance fails to start with

Caused by: com.netflix.astyanax.connectionpool.exceptions.BadRequestException: BadRequestException: [host=192.168.0.10(192.168.0.10):9160, latency=66(66), attempts=1]InvalidRequestException(why:Cannot add already existing column family "graphindex_lock_" to keyspace "titan")
    at com.netflix.astyanax.thrift.ThriftConverter.ToConnectionPoolException(ThriftConverter.java:159)
    at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:65)
    at com.netflix.astyanax.thrift.AbstractOperationImpl.execute(AbstractOperationImpl.java:28)
    at com.netflix.astyanax.thrift.ThriftSyncConnectionFactoryImpl$ThriftConnection.execute(ThriftSyncConnectionFactoryImpl.java:151)
    at com.netflix.astyanax.connectionpool.impl.AbstractExecuteWithFailoverImpl.tryOperation(AbstractExecuteWithFailoverImpl.java:119)
    at com.netflix.astyanax.connectionpool.impl.AbstractHostPartitionConnectionPool.executeWithFailover(AbstractHostPartitionConnectionPool.java:338)
    at com.netflix.astyanax.thrift.ThriftClusterImpl.executeSchemaChangeOperation(ThriftClusterImpl.java:146)
    at com.netflix.astyanax.thrift.ThriftClusterImpl.internalCreateColumnFamily(ThriftClusterImpl.java:240)

in rexstitan.log. Sounds quite similar to the exceptions raised before.

Just to clarify: With fail I mean that Rexster is started and can be queried but failed to load the Titan graph "graph".

Maybe I have to reduce the size to a minimum to check if this is related to cluster size.

edit #2:

It is not related to cluster size. And it's getting really annoying. Sometimes it is the BadRequestException above, sometimes it's a BadRequestException because there already is a keyspace "titan". Or it is an IllegalArgumentException:

2646 [main] WARN  com.tinkerpop.rexster.config.GraphConfigurationContainer -
  Database has already been initialized but not frozen
  java.lang.IllegalArgumentException: Database has already been initialized but not frozen
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:93)
    at com.thinkaurelius.titan.graphdb.configuration.GraphDatabaseConfiguration.<init>(GraphDatabaseConfiguration.java:1294)
    at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:93)
    at com.thinkaurelius.titan.core.TitanFactory.open(TitanFactory.java:73)
    at com.thinkaurelius.titan.tinkerpop.rexster.TitanGraphConfiguration.configureGraphInstance(TitanGraphConfiguration.java:33)
    at com.tinkerpop.rexster.config.GraphConfigurationContainer.getGraphFromConfiguration(GraphConfigurationContainer.java:124)
    at com.tinkerpop.rexster.config.GraphConfigurationContainer.<init>(GraphConfigurationContainer.java:54)
    at com.tinkerpop.rexster.server.XmlRexsterApplication.reconfigure(XmlRexsterApplication.java:99)
    at com.tinkerpop.rexster.server.XmlRexsterApplication.<init>(XmlRexsterApplication.java:47)
    at com.tinkerpop.rexster.Application.<init>(Application.java:97)
    at com.tinkerpop.rexster.Application.main(Application.java:189)

Is it not possible to start multiple nodes at once, do they conflict? This is the only reason I can think of, because I can get any exception and sometimes it works fine.


回答1:


The problem is the simultaneous startup of the Titan nodes. (version 0.5.0)
The more nodes you startup at once, the more likely the BadRequestExceptions are, since all the nodes try to create the same keyspace/column families in the Cassandra cluster concurrently.

To overcome this issue you have to

  1. start Cassandra (all nodes at once is fine)
  2. start a single Titan node
  3. open the Rexster console on this node, create the schema and indices
  4. start the remaining Titan nodes


来源:https://stackoverflow.com/questions/28094341/titan-node-does-not-come-up

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!