How to debug hadoop mapreduce jobs from eclipse?

后端未结

关注

 5  1942

I\'m running hadoop in a single-machine, local-only setup, and I\'m looking for a nice, painless way to debug mappers and reducers in eclipse. Eclipse has no probl

相关标签:

5条回答

执笔经年

2020-12-09 11:05
Make changes in /bin/hadoop (hadoop-env.sh) script. Check to see what command has been fired. If the command is jar, then only add remote debug configuration.
```
if [ "$COMMAND" = "jar" ] ; then
  exec "$JAVA" -Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=8999 $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
else
  exec "$JAVA" $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"
fi
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
终归单人心

2020-12-09 11:05
I also like to debug via unit test w/MRUnit. I will use this in combination with approvaltests which creates an easy visualization of the Map Reduce process, and makes it easy to pass in scenarios that are failing. It also runs seamlessly from eclipse.

For example:
```
HadoopApprovals.verifyMapReduce(new WordCountMapper(), 
                         new WordCountReducer(), 0, "cat cat dog");
```
Will produce the output:
```
[cat cat dog] 
-> maps via WordCountMapper to ->
(cat, 1) 
(cat, 1) 
(dog, 1)

-> reduces via WordCountReducer to ->
(cat, 2) 
(dog, 1)
```
There's a video on the process here: http://t.co/leExFVrf
0 讨论(0)
发布评论:

提交评论
- 加载中...
礼貌的吻别

2020-12-09 11:12
The only way you can debug hadoop in eclipse is running hadoop in local mode. The reason being, each map reduce task run in ist own JVM and when you don't hadoop in local mode, eclipse won't be able to debug.

When you set hadoop to local mode, instead of using hdfs API(which is default), hadoop file system changes to file:///. Thus, running hadoop fs -ls will not be a hdfs command, but more of hadoop fs -ls file:///, a path to your local directory. None of the JobTracker or NameNode runs.

These blogposts might help:
- http://let-them-c.blogspot.com/2011/07/running-hadoop-locally-on-eclipse.html
- http://let-them-c.blogspot.com/2011/07/configurations-of-running-hadoop.html
0 讨论(0)
发布评论:

提交评论
- 加载中...
慢半拍i

2020-12-09 11:15
Adding args to hadoop's internal java command can be done via HADOOP_OPTS env variable:
```
export HADOOP_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,address=5005,suspend=y"
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤城傲影

2020-12-09 11:16

Besides the recommended MRUnit I like to debug with eclipse as well. I have a main program. It instantiates a Configuration and executes the MapReduce job directly. I just debug with standard eclipse Debug configurations. Since I include hadoop jars in my mvn spec, I have all hadoop per se in my class path and I have no need to run it against my installed hadoop. I always test with small data sets in local directories to make things easy. The defaults for the configuration behaves as a stand alone hadoop (file system is available)

0 讨论(0)
发布评论:

提交评论
- 加载中...