Pandas Subset of a Time Series Without Resampling

陌路散爱 提交于 2019-12-23 05:38:09

问题


I have a pandas data series with cumulative daily returns for a series:

Date    CumReturn
3/31/2017    1
4/3/2017     .99
4/4/2017     .992
 ...        ...
4/28/2017    1.012
5/1/2017     1.011
 ...         ...
5/31/2017    1.022
 ...         ...
6/30/2017    1.033
 ...         ...

I want only the month-end values.

Date    CumReturn
4/28/2017    1.012
5/31/2017    1.022
6/30/2017    1.033

Because I want only the month-end values, resampling doesn't work as it aggregates the interim values.

What is the easiest way to get only the month end values as they appear in the original dataframe?


回答1:


Use the is_month_end component of the .dt date accessor:

# Ensure the date column is a Timestamp
df['Date'] = pd.to_datetime(df['Date'])

# Filter to end of the month only
df = df[df['Date'].dt.is_month_end]

Applying this to the data you provided:

        Date  CumReturn
0 2017-03-31      1.000
5 2017-05-31      1.022
6 2017-06-30      1.033

EDIT

To get business month end, compare using BMonthEnd(0):

from pandas.tseries.offsets import BMonthEnd

# Ensure the date column is a Timestamp
df['Date'] = pd.to_datetime(df['Date'])

# Filter to end of the month only
df = df[df['Date'] == df['Date'] + BMonthEnd(0)]

Applying this to the data you provided:

        Date  CumReturn
0 2017-03-31      1.000
3 2017-04-28      1.012
5 2017-05-31      1.022
6 2017-06-30      1.033



回答2:


df.sort_values('Date').groupby([df.Date.dt.year,df.Date.dt.month]).last()
Out[197]: 
                Date  CumReturn
Date Date                      
2017 3    2017-03-31      1.000
     4    2017-04-28      1.012
     5    2017-05-31      1.022
     6    2017-06-30      1.033



回答3:


Assuming that the dataframe is already sorted by 'Date' and that the values in that column are Pandas timestamps, you can convert them to YYYY-mm string values for grouping and take the last value:

df.groupby(df['Date'].dt.strftime('%Y-%m'))['CumReturn'].last()

# Example output:
# 2017-01    0.127002
# 2017-02    0.046894
# 2017-03    0.005560
# 2017-04    0.150368


来源:https://stackoverflow.com/questions/48121396/pandas-subset-of-a-time-series-without-resampling

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!