Apache PIG: Get the day of the week and split accordingly

て烟熏妆下的殇ゞ 提交于 2019-12-11 15:55:23

问题


I need to split dates between two and ignore saturday and sunday from it. Built in function on 0.11.1 will help to get day of the week but how to find out whether that is saturday or Sunday? Anyone has any idea of it? My expected output described below.

Input:

User Fromdate Todate

Raj 10/3/2013 10/8/2013

James 10/4/2013 10/7/2013

etc..

Expected Output:

Raj 10/3/2013

Raj 10/4/2013

Raj 10/7/2013

Raj 10/8/2013

James 10/4/2013

James 10/7/2013


回答1:


Since the Pig DateTime objects are really Unix epoch time in milliseconds, this can be easily done with out-of-the-box Pig operators.

(DaysBetween(ToDate('10/3/2013','MM/dd/yyyy'),ToDate(0L)) + 4L) % 7    
  • Yields a long in range 0...6, where 0 = Sun, 1 = Mon, ... etc
  • 0L represents 1/1/1970, a Thursday
  • Adding 4L days makes 0 = Sunday

Satisfy yourself that this is so from Unix command line:

$>  date -d '1/1/1970' +%w-%a   
4-Thu     
$>  date -d '10/3/2013' +%w-%a  
4-Thu

Of course, if you are comfortable with UDF, and this is a commonly occurring requirement, that's the best solution.

Carter Shore




回答2:


You'll need to write a UDF. You can use Java's Calendar class to do this.



来源:https://stackoverflow.com/questions/19152907/apache-pig-get-the-day-of-the-week-and-split-accordingly

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!