Debugging hadoop in eclipse

白昼怎懂夜的黑 提交于 2019-12-24 03:24:21

问题


Is it possible to debug Hadoop's source code in Eclipse?I'm not asking about the map reduce tasks. I want to see which part of the Hadoop source code is responsible for scheduling the map reduce tasks and how it works. Is there any mechanism by which it can be done?


回答1:


You can download Hadoop project and integrate it to your eclipse, and use F5 or F6 to debug. You have different mode of debugging in eclipse:

  1. F5 : Step by Step debugging
  2. F6 : Skips loops and Subroutines
  3. F7 : Skips the loop or subroutine and returns to the last cursor point.
  4. F8 : Execute and come out of debugging

Or you can try yourself to understand the workflow by following step by step, you can begin from your run() method in your main.

To answer your question: who does schedule the map task?

As you can see in this schema, files are divided by the InputFormat class into fixed-size pieces called InputSplits. Each split is then given to a mapper, which is a node that was assigned a map task.

The same InputFormat class also provides a RecordReader responsible for parsing the split and extracting records.Each record is passed to a map function as a (key, value) pair. So the Mapper class is the one who call map methods.

Here is the workflow of the wordcount example:

Where the FileInputFormat is an abstract class that extends the abstract class InputFormat, and the TextInputFormat extends the FileInputFormat class.




回答2:


Here are instructions from Apache Hadoop documentation. I haven't tried them out, but the instructions are good enough to get started.



来源:https://stackoverflow.com/questions/23235343/debugging-hadoop-in-eclipse

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!