I have a file which contains entries like this:
1,1,07 2012,07 2013,11,blablabla
The two first fields are ids. The third is the begin date(month year) and the fourth is the end date. The fifth field is the number of months btweens these two dates. And the last field contains text.
Here is my pig code to load this data:
f = LOAD 'file.txt' USING PigStorage(',') AS (id1:int, id2:int, date1:chararray, date2:chararray, duration:int, text:chararray);
I would like to filter my file so that I keep only the entries where date2 is less than three years from today. Is it possible to that in Pig ?
Thanks.
No need to write a custom function:
In Pig 0.11 you can convert the date2 field from chararray to datetime data type using the ToDate() function, and then get the difference between the CurrentTime() and date2 using YearsBetween() and filter according to it. for example:
g = FILTER f BY YearsBetween(CurrentTime(),ToDate(date2 + ' 01', 'yyyy MM dd'))<3
in pig 11, is there a support for comparing datetime types? for example: date1:datetime
and filter has condition: date1 >= ToDate('1999-01-01')
does this comparison returns correct result?
If you are stuck on Pig older than .11, use datafu. They have a function UnixToIso
DEFINE UnixToISO org.apache.pig.piggybank.evaluation.datetime.convert.UnixToISO();
来源:https://stackoverflow.com/questions/17185538/pig-how-to-manipulate-and-compare-dates