How to tell MapReduce how many mappers to use at the same time?

五迷三道 提交于 2019-12-25 06:25:59

问题


I am writing an indexing app for MapReduce. I was able to split inputs with NLineInputFormat, and now I've got few hundred mappers in my app. However, only 2/mashine of those are active at the same time, the rest are "PENDING". I believe that such a behavior slows the app significantly.

How do I make hadoop run at least 100 of those at the same time per machine?

I am using the old hadoop api syntax. Here's what I've tried so far:

    conf.setNumMapTasks(1000);
    conf.setNumTasksToExecutePerJvm(500);

none of those seem to have any effect.

Any ideas how I can make the mappers actually RUN in parallel?


回答1:


The JobConf.setNumMapTasks() is just a hint to the MR framework and I am not sure the effect of calling it. In your case the total number of map tasks across the whole job should be equal to the total number of lines in the input divided by the number of lines configured in the NLineInputFormat. You can find more details on the total number of map/reduce tasks across the whole job here.

The description for mapred.tasktracker.map.tasks.maximum says

The maximum number of map tasks that will be run simultaneously by a task tracker.

You need to configure the mapred.tasktracker.map.tasks.maximum (which is defaulted to 2) to change the number of map tasks run parallely on a particular node by the task tracker. I could not get the documentation for 0.20.2, so I am not sure if the parameter exists or if the same parameter name is used in 0.20.2 release.



来源:https://stackoverflow.com/questions/7471289/how-to-tell-mapreduce-how-many-mappers-to-use-at-the-same-time

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!