DistributedCache in Hadoop 2.x

我的梦境 提交于 2020-01-17 04:43:05

问题


I have a problem in DistributedCache in Hadoop 2.x the new API, I found some people working around this issue, but it does not solve my problem example

this solution does not work with me Because i got a NullPointerException when trying to retrieve the data in DistributedCache

My Configuration is as follows:

Driver

    public int run(String[] arg) throws Exception {
        Configuration conf = this.getConf();
        Job job= new Job(conf,"job Name");
        ...
        job.addCacheFile(new URI(arg[1]);

Setup

    protected void setup(Context context)
            throws IOException, InterruptedException {
        Configuration conf = context.getConfiguration();
        URI[] cacheFiles = context.getCacheFiles();
        BufferedReader dtardr = new BufferedReader(new FileReader(cacheFiles[0].toString()));

Here when it starts creating the buffered reader it throws the NullPointerException, this happenning because context.getCacheFiles(); returns always NULL. How to solve this problem, and where is the cache files stored(HDFS, or local file system)


回答1:


If you use the local JobRunner in Hadoop (non-distributed mode, as a single Java process), then no local data directory is created; the getLocalCacheFiles() or getCacheFiles() call will return an empty set of results.Can you make sure that you are running your job in a Distributed or Pseudo-Distributed mode.

Hadoop frame work will copy files set in the distributed cache to the local working directory of each task in the job. There are copies of all cached files, placed in the local file system of each worker machine. (They will be in a subdirectory of mapred.local.dir.)

Can you refer this link for understanding more about DistributedCache.



来源:https://stackoverflow.com/questions/20497968/distributedcache-in-hadoop-2-x

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!