Pandas Subset of a Time Series Without Resampling

问题

I have a pandas data series with cumulative daily returns for a series:

Date    CumReturn
3/31/2017    1
4/3/2017     .99
4/4/2017     .992
 ...        ...
4/28/2017    1.012
5/1/2017     1.011
 ...         ...
5/31/2017    1.022
 ...         ...
6/30/2017    1.033
 ...         ...

I want only the month-end values.

Date    CumReturn
4/28/2017    1.012
5/31/2017    1.022
6/30/2017    1.033

Because I want only the month-end values, resampling doesn't work as it aggregates the interim values.

What is the easiest way to get only the month end values as they appear in the original dataframe?

回答1:

Use the is_month_end component of the .dt date accessor:

# Ensure the date column is a Timestamp
df['Date'] = pd.to_datetime(df['Date'])

# Filter to end of the month only
df = df[df['Date'].dt.is_month_end]

Applying this to the data you provided:

        Date  CumReturn
0 2017-03-31      1.000
5 2017-05-31      1.022
6 2017-06-30      1.033

EDIT

To get business month end, compare using BMonthEnd(0):

from pandas.tseries.offsets import BMonthEnd

# Ensure the date column is a Timestamp
df['Date'] = pd.to_datetime(df['Date'])

# Filter to end of the month only
df = df[df['Date'] == df['Date'] + BMonthEnd(0)]

Applying this to the data you provided:

        Date  CumReturn
0 2017-03-31      1.000
3 2017-04-28      1.012
5 2017-05-31      1.022
6 2017-06-30      1.033

回答2:

df.sort_values('Date').groupby([df.Date.dt.year,df.Date.dt.month]).last()
Out[197]: 
                Date  CumReturn
Date Date                      
2017 3    2017-03-31      1.000
     4    2017-04-28      1.012
     5    2017-05-31      1.022
     6    2017-06-30      1.033

回答3:

Assuming that the dataframe is already sorted by 'Date' and that the values in that column are Pandas timestamps, you can convert them to YYYY-mm string values for grouping and take the last value:

df.groupby(df['Date'].dt.strftime('%Y-%m'))['CumReturn'].last()

# Example output:
# 2017-01    0.127002
# 2017-02    0.046894
# 2017-03    0.005560
# 2017-04    0.150368

来源：https://stackoverflow.com/questions/48121396/pandas-subset-of-a-time-series-without-resampling

标签

python

pandas