multiple input into a Mapper in hadoop

廉价感情. 提交于 2019-12-11 11:52:52

问题


I am trying to send two files to a hadoop reducer. I tried DistributedCache, but anything I put using addCacheFile in main, doesn't seem to be given back to with getLocalCacheFiles in the mapper.

right now I am using FileSystem to read the file, but I am running locally so I am able to just send the name of the file. Wondering how to do this if I was running on a real hadoop system.

is there anyway to send values to the mapper except the file that it's reading?


回答1:


I also had a lot of problems with distribution cache, and sending parameters. Options worked for me are below:

For distributed cache usage: For me it was a nightmare to get the url/path to file on HDFS in Map or Reduce, but with symlink it worked in run() method of the job

DistributedCache.addCacheFile(new URI(file+"#rules.dat"), conf);
DistributedCache.createSymlink(conf);

and then read in Map or Reduce in header, before methods

public static FileSystem hdfs;

and then in setup() method of Map or Reduce

hdfs = FileSystem.get(new Configuration()).open(new Path ("rules.dat"));

For parameters: Send some values to Map or Reduce (could be a filename to open from HDFS):

public int run(String[] args) throws Exception {
    Configuration conf = new Configuration();
    ...
    conf.set("level", otherArgs[2]); //sets variable level from command line, it could be a filename
    ...
}

then in Map or Reduce class just:

int level = Integer.parseInt(conf.get("level")); //this is int, but you can read also strings, etc.



回答2:


If distributed cache suites your need - it is a way to go.

getLocalCacheFiles works differently in the local mode and in the distributed mode. (it actually do not work in local mode).

Look into this link: http://developer.yahoo.com/hadoop/tutorial/module5.html look for the phrase: As a cautionary note:



来源:https://stackoverflow.com/questions/9148724/multiple-input-into-a-mapper-in-hadoop

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!