python list of usa holidays between a range

青春壹個敷衍的年華 提交于 2019-12-24 18:43:47

问题


I have a need to fetch list of holidays in a given range, i.e., if start date is 20/12/2016 & end date is 10/1/2017, then I should get 25/12/2017, 1/1/2017. I can do this using Pandas, but in my case, I have limitation that I need to AWS Glue service & Pandas are not supported in AWS Glue.

I am trying to use native python library holidays, but I couldn't see API document to fetch holidays between from & to date?

Here is what I have tried:

import holidays
import datetime
from datetime import date, timedelta
import dateutil
from dateutil.relativedelta import relativedelta

us_holidays = holidays.UnitedStates()

for date2,name in sorted(holidays.US(state='CA', years=2013).items()):
    print (date2,name)


days_from_closest_holiday = [(abs(fdate - hdate)).days for hdate in holidays.date.tolist()]
print days_from_closest_holiday

Output of us_holidays:

(datetime.date(2013, 1, 1), "New Year's Day")
(datetime.date(2013, 1, 21), 'Martin Luther King, Jr. Day')
(datetime.date(2013, 2, 18), "Washington's Birthday")
(datetime.date(2013, 3, 31), 'C\xc3\xa9sar Ch\xc3\xa1vez Day')
(datetime.date(2013, 4, 1), 'C\xc3\xa9sar Ch\xc3\xa1vez Day (Observed)')
(datetime.date(2013, 5, 27), 'Memorial Day')
(datetime.date(2013, 7, 4), 'Independence Day')
(datetime.date(2013, 9, 2), 'Labor Day')
(datetime.date(2013, 10, 14), 'Columbus Day')
(datetime.date(2013, 11, 11), 'Veterans Day')
(datetime.date(2013, 11, 28), 'Thanksgiving')
(datetime.date(2013, 12, 25), 'Christmas Day')

I need to pass a date range, fromdate, todate to the us_holidays object, and I expect list of holidays with in the range, something like the following:

DatetimeIndex(['2013-12-25', '2014-01-01'], dtype='datetime64[ns]', freq=None)
[10, 17]

In pandas, I can get it using the following:

cal = USFederalHolidayCalendar()
holidays = pd.to_datetime(cal.holidays(start_date, end_date))
print holidays

As mentioned above, I need to use AWS Glue, and currently Glue doesn't support Pandas.

Appreciate any help please.

Thanks


回答1:


After a while of exploring, I have the got the solution myself, sharing here for reference as a solution:

%pyspark
import holidays
import datetime
from datetime import date, timedelta
import dateutil
from dateutil.relativedelta import relativedelta

us_holidays = holidays.UnitedStates()
custom_holidays = holidays.HolidayBase()
holidays_within_range=[]
fmt = '%Y-%m-%d'
holidays2013=[]

for date2,name in sorted(holidays.US(state='CA', years=2013).items()):
    holidays2013.append(date2.strftime(fmt))
print holidays2013

fdate = date(2013, 1, 1)

s_date = fdate - dateutil.relativedelta.relativedelta(days=7)
e_date = fdate + relativedelta(months=1)
start_date = s_date.strftime(fmt)
end_date = e_date.strftime(fmt)
print "Range : "
print start_date, end_date

dd = [s_date + timedelta(days=x) for x in range((e_date-s_date).days + 1)]
for d in dd:
    if(d in us_holidays):
        custom_holidays.append(d)
        holidays_within_range.append(d.strftime(fmt))

print holidays_within_range
days_from_closest_holiday = [(abs(fdate - datetime.datetime.strptime(hdate, fmt).date())).days for hdate in holidays_within_range]
print days_from_closest_holiday

The outputs for the above would be :

['2013-01-01', '2013-01-21', '2013-02-18', '2013-03-31', '2013-04-01', '2013-05-27', '2013-07-04', '2013-09-02', '2013-10-14', '2013-11-11', '2013-11-28', '2013-12-25']
Range : 
2012-12-25 2013-02-01
['2012-12-25', '2013-01-01', '2013-01-21']
[7, 0, 20]

This doesn't require Pandas, and I hope this works with AWS Glue. Haven't updated the variable names, please change as necessary.

Thanks



来源:https://stackoverflow.com/questions/49272664/python-list-of-usa-holidays-between-a-range

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!