Error 1045 on sum function in pig latin with an int

纵饮孤独 提交于 2019-12-11 19:18:56

问题


The following pig latin script:

data = load 'access_log_Jul95' using PigStorage(' ') as (ip:chararray, dash1:chararray, dash2:chararray, date:chararray, date1:chararray, getRequset:chararray, location:chararray, http:chararray, code:int, size:int);

splitDate = foreach data generate  size as size:int , ip as ip,  FLATTEN(STRSPLIT(date, ':')) as h;

groupedIp = group splitDate by h.$1;

a = foreach groupedIp{
    added = foreach splitDate generate SUM(size); --
    generate added;
};


describe a;

gives me the error:

ERROR 1045: <file 3.pig, line 10, column 39> Could not infer the matching function for org.apache.pig.builtin.SUM as multiple or none of them fit. Please use an explicit cast.

This error makes me think I need to cast size as an int, but if i describe my groupedIp field, I get the following schema.

groupedIp: {group: bytearray,splitDate: {(size: int,ip: chararray,h: bytearray)}} which indicates that size is an int, and should be able to be used by the sum function.

Am I calling the sum function incorrectly? Let me know if you would like to see any thing else, such as the input file.


回答1:


SUM operates on a bag as input, but you pass it the field 'size'.
Try to eliminate the nested foreach and use:

a = foreach groupedIp generate SUM(splitDate.size);



回答2:


Do some dumps of your data. I'll bet some of the stuff in the size column is non-integer, and Pig runs into that and dies. You could also code up your own isInteger udf to check this before the rest of your processing, and throw out any that aren't integers.




回答3:


SUM, AVG and COUNT are functions that always work on a bag, therefore group the data and then join with the original set like below:

A = load 'nyse_data.txt' as (exchange:chararray, symbol:chararray,date:chararray, pen:float,high:float, low:float, close:float,volume:int, adj_close:float);  
G = group A by symbol;  
C = foreach G generate group, SUM(A.open);


来源:https://stackoverflow.com/questions/16405267/error-1045-on-sum-function-in-pig-latin-with-an-int

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!