How to resolve 'file could only be replicated to 0 nodes, instead of 1' in hadoop?

问题

I have a simple hadoop job that crawls websites and caches them to the HDFS. The mapper checks if a URL already exists in the HDFS and if so, uses it otherwise downloads the page and saves it to the HDFS.

If an network error (404, etc) is encountered while downloading the page, then the URL is skipped entirely - not written to the HDFS. Whenever I run a small list ~1000 websites, I always seem to encounter this error which crashes the job repeatedly in my pseudo distributed installation. What could be the problem?

I'm running Hadoop 0.20.2-cdh3u3.

org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /user/raj/cache/9b4edc6adab6f81d5bbb84fdabb82ac0 could only be replicated to 0 nodes, instead of 1
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1520)
    at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:665)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1434)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1430)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1428)

回答1:

The problem was an unclosed FileSystem InputStream instance in the mapper that was used for caching input to file system.

回答2:

Looking by sources you probabbly get out of space on your local machine (or VM). This exception is caused when system can not find enough nodes for the replication. The class responsible for selecting nodes is ReplicationTargetChooser.

http://javasourcecode.org/html/open-source/hadoop/hadoop-0.20.203.0/org/apache/hadoop/hdfs/server/namenode/ReplicationTargetChooser.java.html

Its main method is chooseTarget (line 67).
After diving into code you will get into isGoodTarget method, which also checks if there is enough space on the node: Line 404.
If you will enable debug logs, you will probabbly see the relevant message.

回答3:

Please check the namenode logs, matching the time stamps. If there is an indication about problems with IPC, you are likely running out of "xcievers". In my case, setting dfs.datanode.max.xcievers in hdfs-site.xml to a larger value, i.e. 4096 or 8192, fixed that particular problem for me.

来源：https://stackoverflow.com/questions/9987033/how-to-resolve-file-could-only-be-replicated-to-0-nodes-instead-of-1-in-hadoo

标签

Hadoop

Cloudera