ZooKeeper exists failed after 3 retries

问题

I am running Hadoop-1.2.1 and HBase-0.94.11 in a pseudo-distributed mode.

Due to power failure Hadoop and HBase set up went down.Next time when I restarted my machine and the pseudo-distribution set up, HBase stopped working with the following errors on HBase shell:

13/11/27 13:53:27 ERROR zookeeper.RecoverableZooKeeper: ZooKeeper exists failed after 3 retries
13/11/27 13:53:27 WARN zookeeper.ZKUtil: hconnection Unable to set watcher on znode (/hbase/hbaseid)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1041)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:172)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:450)
    at org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61)
    at org.apache.hadoop.hbase.zookeeper.ClusterId.getId(ClusterId.java:50)
    at org.apache.hadoop.hbase.zookeeper.ClusterId.hasId(ClusterId.java:44)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.ensureZookeeperTrackers(HConnectionManager.java:720)
    at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getMaster(HConnectionManager.java:789)
    at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:129)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)

Following are the processes :

hduser@user-ubuntu:~$ jps
16914 NameNode
19955 Jps
29460 Main
17728 TaskTracker
19776 HMaster
17490 JobTracker
17392 SecondaryNameNode

回答1:

Are you sure your Zookeeper process is running (your jps listing doesn't show an entry for QuorumPeerMain)? The jps stack may not show all java processes running - try using a ps axww | grep QuorumPeerMain.

If your zookeeper refuses to start, check its logs to see if there are some stack trace clues

回答2:

It's straightforward the zookeeper quorum process is not running - if it has been, there'd have been another java process:

hduser@user-ubuntu:~$ jps
16914 NameNode
19955 Jps
29460 Main
17728 TaskTracker
19776 HMaster
17490 JobTracker
17392 SecondaryNameNode

xxxxx HQuorumPeer

Zookeeper is required for HBase cluster - as it manages it.

Possible solutions: By default HBase manages zookeeper itself i.e. starting and stopping the zookeeper quorum (the cluster of zookeeper nodes) - to verify the settings look into the file conf/hbase-evn.sh (in your hbase directory) there must be a line:

export HBASE_MANAGES_ZK=true

Basically tells HBase whether it should manage its own instance of Zookeeper or not. In case it is set to false, edit to true.

Also verify the HBase conf at conf/hbase-site.xml,

The minimum conf that should work for pseudo-distributed mode is:

<configuration>
<property>
  <name>hbase.cluster.distributed</name>
  <value>true</value>
</property>
<property>
    <name>hbase.rootdir</name>
   <value>hdfs://localhost:9000/hbase</value>   
  </property>
  <property>
    <name>hbase.zookeeper.property.dataDir</name>
    <value>/home/<yourusername>/zookeeper</value>
  </property>
</configuration>

Now stop the HBase, if it's been running:

$ ./bin/stop-hbase.sh

make the neccessary changes and start it again:

$ ./bin/start-hbase.sh

Answers you may find helpful:1 2

来源：https://stackoverflow.com/questions/20239072/zookeeper-exists-failed-after-3-retries

标签

Hadoop

hbase

apache-zookeeper