问题
Quick q on Pig UDFs.
I have a custom UDF that I want to accept multiple columns:
package pigfuncs;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.pig.EvalFunc;
import org.apache.pig.FuncSpec;
import org.apache.pig.data.DataBag;
import org.apache.pig.data.DataType;
import org.apache.pig.data.Tuple;
import org.apache.pig.impl.logicalLayer.FrontendException;
import org.apache.pig.impl.logicalLayer.schema.Schema;
public class DataToXML extends EvalFunc<String> {
public DataToXML() {
}
@Override
public List<FuncSpec> getArgToFuncMapping()
throws FrontendException {
List<FuncSpec> funcList = new ArrayList<FuncSpec>();
funcList.add(new FuncSpec(this.getClass().getName(),
new Schema(new Schema.FieldSchema(null, DataType.CHARARRAY))));
return funcList;
}
@Override
public String exec(Tuple t) throws IOException {
if (t == null || t.size() == 0)
return "";
StringBuilder result = new StringBuilder();
result.append("<Num>");
result.append((String) t.get(0));
result.append("</Num>");
result.append("<Tags>");
result.append((String) t.get(1));
result.append("</Tags");
return result.toString();
}
}
I want to pass 2 columns; Number and Data. I want the output to be XYZabc
I can't work out how to get the pig script to call this, every combination results in a different error!
An excerpt from my script:
-- apply some sort of UDF that returns the exact line without the stop words
nostop = FOREACH cleansed GENERATE lotnum,pigfuncs.StopWords(description) as data;
-- put into xml
out = FOREACH nostop GENERATE pigfuncs.DataToXML(lotnum, data);
The error from this is:
Could not infer the matching function for rapp.pigfuncs.DataToXML as multiple or none of them fit. Please use an explicit cast.
Hope this is an easy one for the Pig gurus :)
Duncan
回答1:
Your getArgToFuncMapping()
implementation indicates you are only expecting one argument. (You have only added one field to funcList
.) If you're not going to be providing multiple implementations for this UDF depending on the types of the arguments, there's no real need to implement getArgToFuncMapping()
. Just skip it and this error will go away.
来源:https://stackoverflow.com/questions/18186421/pig-udf-that-accept-multiple-inputs