Passing a filename to Java UDF from Pig using distributed cache

喜欢而已 提交于 2019-12-10 09:59:12

问题


I am using a small map file in my Java UDF function and I want to pass the filename of this file from Pig through the constructor.

Following is the relevant part from my UDF function

public GenerateXML(String mapFilename) throws IOException {
    this(null);
}

public GenerateXML(String mapFilename) throws IOException {
    if (mapFilename != null) {
        // do preocessing
    }
}

In the Pig script I have the following line

DEFINE GenerateXML com.domain.GenerateXML('typemap.tsv');

This works in local mode, but not in distributed mode. I am passing the following parameters to Pig in command line

pig -Dmapred.cache.files="/path/to/typemap.tsv#typemap.tsv" -Dmapred.create.symlink=yes -f generate-xml.pig

And I am getting the following exception

2013-01-11 10:39:42,002 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: 
<file generate-xml.pig, line 16, column 42> Failed to generate logical plan. Nested exception: java.lang.RuntimeException: could not instantiate 'com.domain.GenerateXML' with arguments '[typemap.tsv]'

Any idea what I need to change to make it work?


回答1:


The problem is solved now.

It seems that when I run the Pig script using following parameters

pig -Dmapred.cache.files="/path/to/typemap.tsv#typemap.tsv" -Dmapred.create.symlink=yes -f generate-xml.pig

The /path/to/typemap.tsv should be the local path and not a path in HDFS.




回答2:


You can use getCacheFiles function in a Pig UDF and it will be enough - you don't have to use any additional properties like mapred.cache.files. Your case can be implemented like this:

public class UdfCacheExample  extends EvalFunc<Tuple> {

    private Dictionary dictionary;
    private String pathToDictionary;

    public UdfCacheExample(String pathToDictionary) {
        this.pathToDictionary = pathToDictionary;
    }

    @Override
    public Tuple exec(Tuple input) throws IOException {
        Dictionary dictionary = getDictionary();
        return createSomething(input);
    }

    @Override
    public List<String> getCacheFiles() {
        return Arrays.asList(pathToDictionary);
    }

    private Dictionary getDictionary() {
        // lazy initialization here
    }
}


来源:https://stackoverflow.com/questions/14276749/passing-a-filename-to-java-udf-from-pig-using-distributed-cache

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!