Hadoop Error - All data nodes are aborting

I am using Hadoop 2.3.0 version. Sometimes when I execute the Map reduce job, the below errors will get displayed.

14/08/10 12:14:59 INFO mapreduce.Job: Task Id : attempt_1407694955806_0002_m_000780_0, Status : FAILED
Error: java.io.IOException: All datanodes 192.168.30.2:50010 are bad. Aborting...
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1023)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:838)
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:483)

When I try to check the log files for these failed tasks, the log folder for this task will be empty.

I am not able to understand the reason behind this error. Could someone please let me know how to resolve this issue. Thanks for your help.

Ramanan R

You seem to be hitting the open file handles limit of your user. This is a pretty common issue, and can be cleared in most cases by increasing the ulimit values (its mostly 1024 by default, easily exhaustible by multi-out jobs like yours).

You can follow this short guide to increase it: http://blog.cloudera.com/blog/2009/03/configuration-parameters-what-can-you-just-ignore/ [The section "File descriptor limits"]

Answered by Harsh J - https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/kJRUkVxmfhw

Devendra Parhate

Setting spark.shuffle.service.enabled to true resolved this issue for me.

spark.dynamicAllocation.enabled allows Spark to assign the executors dynamically to different task. The spark.shuffle.service.enabled when set to false disables the external shuffle service and data is stored only on executors. When the executors is reassigned the data is lost and the exception

java.io.IOException: All datanodes are bad.

is thrown for data request.

来源：https://stackoverflow.com/questions/25232179/hadoop-error-all-data-nodes-are-aborting

标签

Hadoop

MapReduce

HDFS

yarn

hadoop2