Data Replication error in Hadoop

问题

I am implementing the Hadoop Single Node Cluster on my machine by following Michael Noll's tutorial and have come across data replication error:

Here's the full error message:

> hadoop@laptop:~/hadoop$ bin/hadoop dfs -copyFromLocal
> tmp/testfiles testfiles
> 
> 12/05/04 16:18:41 WARN hdfs.DFSClient: DataStreamer Exception:
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hadoop/testfiles/testfiles/file1.txt could only be replicated to
> 0 nodes, instead of 1   at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>     at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)     at
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)     at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)     at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)     at
> java.security.AccessController.doPrivileged(Native Method)  at
> javax.security.auth.Subject.doAs(Subject.java:396)  at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 
>     at org.apache.hadoop.ipc.Client.call(Client.java:740)   at
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)  at
> $Proxy0.addBlock(Unknown Source)    at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>     at $Proxy0.addBlock(Unknown Source)     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
> 
> 12/05/04 16:18:41 WARN hdfs.DFSClient: Error Recovery for block null
> bad datanode[0] nodes == null 12/05/04 16:18:41 WARN hdfs.DFSClient:
> Could not get block locations. Source file
> "/user/hadoop/testfiles/testfiles/file1.txt" - Aborting...
> copyFromLocal: java.io.IOException: File
> /user/hadoop/testfiles/testfiles/file1.txt could only be replicated to
> 0 nodes, instead of 1 12/05/04 16:18:41 ERROR hdfs.DFSClient:
> Exception closing file /user/hadoop/testfiles/testfiles/file1.txt :
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hadoop/testfiles/testfiles/file1.txt could only be replicated to
> 0 nodes, instead of 1   at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>     at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)     at
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)     at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)     at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)     at
> java.security.AccessController.doPrivileged(Native Method)  at
> javax.security.auth.Subject.doAs(Subject.java:396)  at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /user/hadoop/testfiles/testfiles/file1.txt could only be replicated to
> 0 nodes, instead of 1   at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1271)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:422)
>     at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)     at
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)     at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)     at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)     at
> java.security.AccessController.doPrivileged(Native Method)  at
> javax.security.auth.Subject.doAs(Subject.java:396)  at
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
> 
>     at org.apache.hadoop.ipc.Client.call(Client.java:740)   at
> org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)  at
> $Proxy0.addBlock(Unknown Source)    at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>     at $Proxy0.addBlock(Unknown Source)     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2937)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2819)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)

Also when I execute:

bin/stop-all.sh

It says that datanode has not been started and thus cannot be stopped. Though, the output of jps says the datanode being present.

I tried formatting the namenode, changing owner permissions, but it does not seem to work. Hope I didn't miss any other relevant information.

Thanks in advance.

回答1:

The solution that worked for me was to run namenode and datanode one by one and not together using bin/start-all.sh. What happens using this approach is that the error is clearly visible if you are having some problem setting the datanodes on the network and also many posts on stackoverflow suggest that namenode requires some time to start-off, therefore, it should be given some time to start before starting the datanodes. Also, in this case I was having problem with different ids of namenode and datanodes for which I had to change the ids of the datanode with same id as the namenode.

The step by step procedure will be:

Start the namenode bin/hadoop namenode. Check for errors, if any.
Start the datanodes bin/hadoop datanode. Check for errors, if any.
Now start the task-tracker, job tracker using 'bin/start-mapred.sh'

回答2:

Look at your namenode (probably http://localhost:50070) and see how many datanodes it says you have.

If it is 0, then either your datanode isn't running or it isn't configured to connect to the namenode.

If it is 1, check to see how much free space it says there is in the DFS. It may be that the data node doesn't have anywhere it can write data to (data dir doesn't exist, or doesn't have write permissions).

回答3:

Although solved, I'm adding this for future readers. Cody's advice of inspecting the start of namenode and datanode was useful, and further investigation led me to delete the hadoop-store/dfs directory. Doing this solved this error for me.

回答4:

I had the same problem, I took a look at the datanode logs and there was a warning saying that the dfs.data.dir had incorrect permissions... so I just changed them and everything worked, which is kind of weird.

Specifically, my "dfs.data.dir" was set to "/home/hadoop/hd_tmp", and the error I got was:

...
...
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /home/hadoop/hd_tmp/dfs/data, expected: rwxr-xr-x, while actual: rwxrwxr-x
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: All directories in dfs.data.dir are invalid.
...
...

So I simply executed these commands:

I stopped all the demons with "bin/stop-all.sh"
Change the permissions of the directory with "chmod -R 755 /home/hadoop/hd_tmp"
I gave format again to the namenode with "bin/hadoop namenode -format".
I re-started the demons "bin/start-all.sh"
And voilà, the datanode was up and running! (I checked it with the command "jsp", where a process named DataNode was shown).

And then everything worked fine.

回答5:

In my case, I wrongly set one destination for dfs.name.dir and dfs.data.dir. The correct format is

 <property>
 <name>dfs.name.dir</name>
 <value>/path/to/name</value>
 </property>

 <property>
 <name>dfs.data.dir</name>
 <value>/path/to/data</value>
 </property>

回答6:

I removed the extra properties in the hdfs-site.xml and then this issue was gone. Hadoop needs to improve on their error messages. I tried each of the above solutions and none worked.

回答7:

I encountered the same problem. When I looked at localhost:50070, under the cluster summary, all properties were shown as 0 except "DFS Used% 100". Usually, this situation occur because there are some mistakes in the three *-site.xml files under HADOOP_INSTALL/conf and hosts file.

In my case, the cause is unable to resolve the hostname. I solved the problem simply by adding "IP_Address hostname" to /etc/hosts.

回答8:

In my case I had to delete:

/tmp/hadoop-<user-name> folder and format and start using sbin/start-dfs.sh

sbin/start-yarn.sh

来源：https://stackoverflow.com/questions/10447743/data-replication-error-in-hadoop

标签

Hadoop

replication