How to share a variable in Mapper and Reducer class?

你说的曾经没有我的故事 提交于 2019-12-10 00:20:18

问题


I have a requirement like I wanna share a variable between mapper and reducer class. Scenario is as follows:-

Suppose my input records are of type A, B and C. I'm processing these records and accordingly generating the key and value for output.collect in map function. But at the same time I've also declared 3 static int variables in mapper class to keep the count of type of record A, B and C. Now these variables will be updated by various map threads. When all the map tasks are done I wanna pass these three values to Reduce function.

How can this be achieved? I tried overriding close() method but it would be called after every map function is executed not when all the map functions are done executing. Or is there any other way to share variables. I wish to output the total count of each type of record along with whatever processed output I'm displaying.


回答1:


Counters are there for a specific reason, ie. to keep count of some specific state, for example, "NUMBER_OF_RECORDS_DISCARDED".And I believe one can only increment these counters and not set to any arbitrary value(I may be wrong here). But sure they can be used as message passers, but there is a better way, and that is to use job configuration to set a variable and seamlessly. But this can only be used to pass a custom message to mapper or reducer and the changes in mapper will not be available in reducer.

Setting the message/variable using the old mapred API

JobConf job = (JobConf) getConf();
job.set("messageToBePassed-OR-anyValue", "123-awesome-value :P");

Setting the message/variable using the new mapreduce API:

Configuration conf = new Configuration();
conf.set("messageToBePassed-OR-anyValue", "123-awesome-value :P");
Job job = new Job(conf);

Getting the message/variable using the old API in the Mapper and Reducer: The configure() has to be implemented in the Mapper and Reducer class and the values may be then assigned to a class member so as to be used inside map() or reduce().

...
private String awesomeMessage;
public void configure(JobConf job) {
    awesomeMessage = Long.parseLong(job.get("messageToBePassed-OR-anyValue"));
}
...

The variable awesomeMessage can then be used with the map and reduce functions.

Getting the message/variable using the new API in the Mapper and Reducer: Similar thing needs to be done here in the setup().

Configuration conf = context.getConfiguration();
String param = conf.get("messageToBePassed-OR-anyValue");



回答2:


Got the solution.

Used Counters. Which is accessible by reporter class in both Mapper and Reducer.



来源:https://stackoverflow.com/questions/14196097/how-to-share-a-variable-in-mapper-and-reducer-class

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!