What is the usage of Configured class in Hadoop programs?

时间秒杀一切 提交于 2019-12-10 02:26:40

问题


Most of Hadoop MapReduce programs are like this:

public class MyApp extends Configured Implements Tool {
    @Override
    public int run(String[] args) throws Exception {
        Job job = new Job(getConf());
        /* process command line options */
        return job.waitForCompletion(true) ? 0 : 1;
    }
    public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(new MyApp(), args);
        System.exit(exitCode);
    }
}

What is the usage of Configured? As Tool and Configured both have getConf() and setConf() in common. What does it provide to our application?


回答1:


Configured is an implementation class of the interface Configurable. Configured is the base class which has the implementations of getConf() and setConf().

Merely extending this base class enables the class that extends this to be configured using a Configuration and there are more than one implementations for Configuration.

When your code executes the following line,

ToolRunner.run(new MyApp(), args);

Internally it will do this

ToolRunner.run(tool.getConf(), tool, args);

In the above case tool is the MyApp class instance which is an implementation of Tool which just as you said has getConf() but it is just as an interface. The implementation is coming from Configured base class. If you avoid extending Configured class in the above code, then you will have to do the getConf() and setConf() implementations on your own.




回答2:


Configured is a default implementation of the Configurable interface - basically its setConf method retains a private instance variable to the passed Configuration object and getConf() returns that reference

Tool is an extension of the Configurable interface, providing an addition run(..) method and is used with ToolRunner to parse out command line options (using the GenericOptionsParser) and build a Configuration object which is then passed to the setConf(..) method.

Your main class will typically extend Configured such that the Configurable interface methods required in Tool will be implemented for you.

In general you should be using the ToolRunner utility class to launch your MapReduce jobs as it handles the common task of parsing out command line arguments and building the Configuration object. I'd look at the API Docs for ToolRunner for more info.



来源:https://stackoverflow.com/questions/14134865/what-is-the-usage-of-configured-class-in-hadoop-programs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!