How to recover from the following error that started happening after a server crash? Zookeeper won’t start and the following message is showing repeatedly on the log.
The solution for me was to find the 0 length log file in /hadoop/zookeeper/version-2 (or whichever place your dataDir is) and delete it. Start ZooKeeper afterwards.
It looks like you have encountered a known Apache ZooKeeper bug. There are a few different Apache JIRA issues related to this: ZOOKEEPER-1621 and ZOOKEEPER-2332. See the comments in those issues if you're interested in root cause analysis and some potential proposed fixes.
Unfortunately, there is no Apache ZooKeeper release that contains a fix for the bug at this time. There are a few potential workarounds that you could try:
The solution for me was to find the last log file (which had 0 byte length)
You will find this inside the version-2
directory
ls -l -r --sort=time
-rw-r--r-- 1 chris chris 67108880 Jan 24 10:37 log.23c6a70
-rw-r--r-- 1 chris chris 0 Jan 24 10:37 log.23d3fb4
I've tried first to delete the snapshot and the last 2 logfiles which is also working but then you would have version which is "a bit" older.
-rw-r--r-- 1 chris chris 3685904 Jan 24 00:56 snapshot.23c6a6e
Maybe you have to delete the last snapshot file and the last logfile together and the 0 length logfile to be safe.
btw. Logfile and snapshot have the same HEX pattern which have to match
log.23c6a70
snapshot.23c6a6e
They have to match and be consistent and you should have this problem fixed.