问题
I have a pandas data series with cumulative daily returns for a series:
Date CumReturn
3/31/2017 1
4/3/2017 .99
4/4/2017 .992
... ...
4/28/2017 1.012
5/1/2017 1.011
... ...
5/31/2017 1.022
... ...
6/30/2017 1.033
... ...
I want only the month-end values.
Date CumReturn
4/28/2017 1.012
5/31/2017 1.022
6/30/2017 1.033
Because I want only the month-end values, resampling doesn't work as it aggregates the interim values.
What is the easiest way to get only the month end values as they appear in the original dataframe?
回答1:
Use the is_month_end component of the .dt date accessor:
# Ensure the date column is a Timestamp
df['Date'] = pd.to_datetime(df['Date'])
# Filter to end of the month only
df = df[df['Date'].dt.is_month_end]
Applying this to the data you provided:
Date CumReturn
0 2017-03-31 1.000
5 2017-05-31 1.022
6 2017-06-30 1.033
EDIT
To get business month end, compare using BMonthEnd(0)
:
from pandas.tseries.offsets import BMonthEnd
# Ensure the date column is a Timestamp
df['Date'] = pd.to_datetime(df['Date'])
# Filter to end of the month only
df = df[df['Date'] == df['Date'] + BMonthEnd(0)]
Applying this to the data you provided:
Date CumReturn
0 2017-03-31 1.000
3 2017-04-28 1.012
5 2017-05-31 1.022
6 2017-06-30 1.033
回答2:
df.sort_values('Date').groupby([df.Date.dt.year,df.Date.dt.month]).last()
Out[197]:
Date CumReturn
Date Date
2017 3 2017-03-31 1.000
4 2017-04-28 1.012
5 2017-05-31 1.022
6 2017-06-30 1.033
回答3:
Assuming that the dataframe is already sorted by 'Date' and that the values in that column are Pandas timestamps, you can convert them to YYYY-mm string values for grouping and take the last value:
df.groupby(df['Date'].dt.strftime('%Y-%m'))['CumReturn'].last()
# Example output:
# 2017-01 0.127002
# 2017-02 0.046894
# 2017-03 0.005560
# 2017-04 0.150368
来源:https://stackoverflow.com/questions/48121396/pandas-subset-of-a-time-series-without-resampling