Hadoop DistributedCache is deprecated - what is the preferred API?

前端 未结 6 1255
情深已故
情深已故 2020-11-28 04:14

My map tasks need some configuration data, which I would like to distribute via the Distributed Cache.

The Hadoop MapReduce Tutorial shows the usage of the Distribut

6条回答
  •  执念已碎
    2020-11-28 04:55

    I had the same problem. And not only is DistributedCach deprecated but getLocalCacheFiles and "new Job" too. So what worked for me is the following:

    Driver:

    Configuration conf = getConf();
    Job job = Job.getInstance(conf);
    ...
    job.addCacheFile(new Path(filename).toUri());
    

    In Mapper/Reducer setup:

    @Override
    protected void setup(Context context) throws IOException, InterruptedException
    {
        super.setup(context);
    
        URI[] files = context.getCacheFiles(); // getCacheFiles returns null
    
        Path file1path = new Path(files[0])
        ...
    }
    

提交回复
热议问题