Subtract One row's value from another row in Pig

末鹿安然 提交于 2019-12-13 02:31:38

问题


I'm trying to develop a sample program using Pig to analyse some log files. I want to analyze the running time of different jobs. When I read in the log file of the job, I get the start time and the end time of the job, like this:

(Wed,03/20/13,01:03:37,EDT)
(Wed,03/20/13,01:05:00,EDT)

Now, to calculate the elapsed time, I need to subtract these 2 timestamps, but since both timestamps are in the same bag, I'm not sure how to compare them. So I'm looking for an idea on how to do this. thanks!


回答1:


Is there a unique ID for the job that is in both log lines? Also is there something to indicate which event is start, and which is end?

If so, you could read the dataset twice, once for start events, once for end-events, and join the two together. Then you'll have one record with both events in it.

so:

A = FOREACH logline GENERATE id, type, timestamp;
START = FILTER A BY (type == 'start');

END = FILTER A  BY (type == 'end');

JOINED = JOIN START by ID, END by ID;

DIFF = FOREACH JOINED GENERATE (START.timestamp - END.timestamp); // or whatever;


来源:https://stackoverflow.com/questions/15574159/subtract-one-rows-value-from-another-row-in-pig

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!