问题
Could anyone please tell me how to find the difference between two times in PIG...
For e.g., Below are the sample Start_Times and End_Times, I need to find the difference between Start_Time and End_Time in PIG.
12:31:38,14:54:04
10:18:34,13:30:56
13:37:43,15:18:57
08:15:10,11:28:17
Thanks in Advance...
回答1:
Couldn't find a straightforward way. Here is a workaround:
t = LOAD ' input/data' USING PigStorage(',') as (time1:chararray,time2:chararray);
u = FOREACH t GENERATE SecondsBetween(ToDate(time2,'HH:mm:ss'),ToDate(time1,'HH:mm:ss')) as seconds;
v = FOREACH u GENERATE seconds/3600 as hours,(seconds%3600)/60 as minutes,(seconds%3600)%60 as seconds;
STORE v into 'output/data' USING PigStorage(':');
Output for your sample data with this code:
2:22:26
3:12:22
1:41:14
3:13:7
回答2:
Use an UDF to convert to UNIX timestamp, there is one for it in piggybank:
DEFINE ISOToUnix org.apache.pig.piggybank.evaluation.datetime.convert.ISOToUnix();
Then, something like:
a = FOREACH Dates GENERATE ISOToUnix(date2) - ISOToUnix(date1) AS diff ;
It might require a bit of formatting/typing but it should work.
来源:https://stackoverflow.com/questions/24448004/finding-the-difference-between-start-times-and-end-times-in-pig