pandas resample to specific weekday in month

六月ゝ 毕业季﹏ 提交于 2021-02-18 07:39:06

问题


I have a Pandas dataframe where I'd like to resample to every third Friday of the month.

np.random.seed(0)
#requested output:
dates = pd.date_range("2018-01-01", "2018-08-31")
dates_df = pd.DataFrame(data=np.random.random(len(dates)), index=dates)
mask = (dates.weekday == 4) & (14 < dates.day) & (dates.day < 22)
dates_df.loc[mask]

But when a third Friday is missing (e.g. dropping Feb third Friday), I want to have the latest value (so as of 2018-02-15). Using the mask gives me the next value (Feb 17 instead of Feb 15):

# remove February third Friday:
dates_df = dates_df.drop([pd.to_datetime("2018-02-16")])
mask = (dates.weekday == 4) & (14 < dates.day) & (dates.day < 22)
dates_df.loc[mask]

Using monthly resample in combination with loffset gives the end of month values with offsetting the index, which is also not what I want:

from pandas.tseries.offsets import WeekOfMonth
dates_df.resample("M", loffset=WeekOfMonth(week=2, weekday=4)).last()

Is there an alternative (preferably using resample) without having to resample to daily values first and then adding a mask (this takes a long time to complete on my dataframe)


回答1:


Your second attempt is in the right direction IIUC, you just need to resample using WeekOfMonth as the rule, rather than using it as an offset:

dates_df.resample(WeekOfMonth(week=2, weekday=4)).asfreq().dropna()

This approach will not offset the index, it should just return the data for the third Friday for every month.

Dealing with Missing 3rd Friday:

With the above code, if you have a missing 3rd Friday the whole month will be excluded. But depending on how you want to deal with missing data, you can bfill, ffill, pad.. you can amend the above to the following:

dates_df.resample(rule=WeekOfMonth(week=2,weekday=4)).bfill().asfreq(freq='D').dropna()

The above will bfill the missing 3rd Friday with the next value.

Update: Lets work with a fixed data set instead of np.random:

# create a smaller daterange
dates = pd.date_range("2018-05-01", "2018-08-31")

# create a data with only 1,2,3 values
data = [1,2,3] * int(len(dates)/3)

dates_df = pd.DataFrame(data=data, index=dates)
dates_df.head()

# Output:

2018-05-01  1
2018-05-02  2
2018-05-03  3
2018-05-04  1
2018-05-05  2

Now let's check what the data looks like for the 3rd Friday of each month by selecting it manually:

dates_df.loc[[
    pd.Timestamp('2018-05-18'),
    pd.Timestamp('2018-06-15'),
    pd.Timestamp('2018-07-20'),
    pd.Timestamp('2018-08-17')
]]

Output:

2018-05-18  3
2018-06-15  1
2018-07-20  3
2018-08-17  1

If you dont have any missing 3rd Fridays and running the code provided earlier:

dates_df.resample(rule=WeekOfMonth(week=2,weekday=4)).asfreq().dropna()

Will produce the following output:

2018-05-18  3
2018-06-15  1
2018-07-20  3
2018-08-17  1

As you can see the index has not been shifted here and it returned the exact values for the 3rd Friday of each month.

Now say you do have some 3rd Fridays missing, depending how you want to do it (use previous value: ffill, or next value bfill):

  • pad / ffill: propagate last valid observation forward to next valid
  • backfill / bfill: use NEXT valid observation to fill gap
dates_df.drop(index=pd.Timestamp('2018-08-17')).resample(rule=WeekOfMonth(week=2, weekday=4)).ffill().asfreq(freq='D').dropna()

2018-05-18  3
2018-06-15  1
2018-07-20  3
2018-08-17  3

dates_df.drop(index=pd.Timestamp('2018-08-17')).resample(rule=WeekOfMonth(week=2, weekday=4)).bfill().asfreq(freq='D').dropna()

2018-04-20  1
2018-05-18  3
2018-06-15  1
2018-07-20  3
2018-08-17  2

If say the whole index was shifted like your example:

dates_df.resample(rule='M', loffset=WeekOfMonth(week=2, weekday=4)).asfreq().dropna()

# Output:

2018-06-15  1
2018-07-20  1
2018-08-17  2
2018-09-21  3

Whats happening there is you're resampling by rule 'M' (month end) and then you're offsetting (shifting forward) the index by the 3rd Friday of each Month.

As you can see before the offset, this how it looks like:

dates_df.resample(rule='M').asfreq().dropna()

# Output

2018-05-31  1
2018-06-30  1
2018-07-31  2
2018-08-31  3


来源:https://stackoverflow.com/questions/52495310/pandas-resample-to-specific-weekday-in-month

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!