问题
I have a df with OHLC data in a 1m frequency:
Open High Low Close
DateTime
2005-09-06 18:00:00 1230.25 1231.50 1230.25 1230.25
2005-09-06 18:01:00 1230.50 1231.75 1229.25 1230.50
.
.
2005-09-07 15:59:00 1234.50 1235.50 1234.25 1234.50
2005-09-07 16:00:00 1234.25 1234.50 1234.25 1234.25
I need to do a "custom" resample that fits futures hours data where:
- Every day starts at 18:00:00 of the previous day (Monday starts on Sunday 18:00:00)
- Every day ends at 16:00:00 of the current day
- The timestamp should be as of the the time of the close, not the open.
After doing the resample, the output should be:
Open High Low Close
DateTime
2005-09-07 16:00:00 1230.25 1235.50 1229.25 1234.25
Where:
- Open = first (column Open) at 2005-09-06 18:00:00
- High = max (column High) from 2005-09-06 18:00:00 to 2005-09-07 16:00:00
- Low = min (column Low) from 2005-09-06 18:00:00 to 2005-09-07 16:00:00
- Close = last (column Close) at 2005-09-07 16:00:00
I have tried:
- Changing the parameters rule, and base, but it didn't work.
- Playing with reindex with no success.
I used the following 'how':
conversion = {'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'}
回答1:
import pandas as pd
df = pd.read_table('data', sep='\s{2,}')
# Make sure the index is a DatetimeIndex
df.index = pd.DatetimeIndex(df.index)
# discard rows whose time falls between 16:00 and 18:00
df = df.between_time('18:00', '16:00', include_start=True, include_end=True)
proxy = df.index + pd.DateOffset(hours=6)
result = df.groupby(proxy.date).agg(
{'Open': 'first', 'High': 'max', 'Low': 'min', 'Close': 'last'})
result = result.reindex(columns=['Open','High','Low','Close'])
print(result)
yields
Open High Low Close
2005-09-07 1230.25 1235.5 1229.25 1234.25
The code above creates a proxy date which is calculated by adding 6 hours to each datetime in the index. This proxy date is then used as the groupby
value.
In [112]: proxy = pd.DatetimeIndex(df.index) + pd.DateOffset(hours=6)
To see how the proxy values correspond to the index:
In [116]: pd.Series(proxy.date, index=df.index)
Out[116]:
DateTime
2005-09-06 18:00:00 2005-09-07
2005-09-06 18:01:00 2005-09-07
2005-09-07 15:59:00 2005-09-07
2005-09-07 16:00:00 2005-09-07
dtype: object
来源:https://stackoverflow.com/questions/25169851/custom-time-series-resampling-in-pandas