Passing parameters to map function in Hadoop

流过昼夜 提交于 2019-12-18 18:47:41

问题


I am new to Hadoop. I want to access a command line argument from main function(Java program) inside the map function of the mapper class. Please suggest ways to do this.


回答1:


Hadoop 0.20, introduced new MR API, there is not much functionality difference between the new (o.a.h.mapreduce package) and old MR API (o.a.h.mapred) except that data can be pulled within the mappers and the reducers using the new API. What Arnon is mentioned is with the old API.

Check this article for passing the parameters using the new and old API.




回答2:


You can pass parameters by hanging them on the Configuration

 JobConf job = new JobConf(new Configuration(), TheJob.class);
 job.setLong("Param Name",longValue)

The Configuration class has few set methods (Long, Int, Strings etc.) so you can pass parameters of several types. In the map job you can get the configuration from the Context (getConfiguration)




回答3:


In recent Hadoop (e.g. >=0.2 up to 2.4+) you would set this kind of options during the job configuration:

conf = new JobConf(MyJarClass);
conf.set("myStringOption", "myStringValue");
conf.set("myIntOption", 42);

And retrieve those options in the configure() method ofmapper/reducer classes:

public static class MyMapper extends MapReduceBase implements Mapper<...> {

    Integer myIntegerOption;
    String myStringOption;

    @Override
    public void configure(JobConf job) {
        super.configure(job);
        myIntegerOption = job.getInt("myIntOption", -1); 
        // nb: last arg is the default value if option is not set
        myStringOption = job.get("myStringOption", "notSet");
    }

    @Override
    public void map(... key, ... value, 
                    OutputCollector<..> output, Reporter reporter) throws IOException {
        // here you can use the options in your processing
        processRecord(key, value, myIntOption, myStringOption);
    }

}

Note that configure() will be called once before any records are passed to the map or reduce.



来源:https://stackoverflow.com/questions/8457183/passing-parameters-to-map-function-in-hadoop

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!