Pandas resample skip incomplete groups at the start

问题

I have 15 min OHLC data coming in continuously from an API and I wish to resample it to 30 min 1H etc

+----------------------+----------+----------+----------+----------+------------+
|      time            |  open    |  high    |   low    |  close   |  volumeto  |
+----------------------+----------+----------+----------+----------+------------+
|                      |          |          |          |          |            |
| 2018-06-26 03:30:00  | 6244.56  | 6250.54  | 6243.55  | 6247.34  | 1801280.76 |
| 2018-06-26 03:45:00  | 6247.34  | 6257.61  | 6246.43  | 6248.23  | 2551368.76 |
| 2018-06-26 04:00:00  | 6248.53  | 6248.53  | 6238.11  | 6239.83  | 2148705.55 |
| 2018-06-26 04:15:00  | 6239.83  | 6240.93  | 6232.14  | 6239.00  | 1906012.45 |
| 2018-06-26 04:30:00  | 6239.78  | 6269.14  | 6239.67  | 6260.96  | 3869709.59 |
| 2018-06-26 04:45:00  | 6261.16  | 6263.50  | 6249.46  | 6260.19  | 2236911.26 |
| 2018-06-26 05:00:00  | 6260.19  | 6284.27  | 6257.97  | 6263.73  | 5131896.24 |
| 2018-06-26 05:15:00  | 6263.73  | 6272.73  | 6263.59  | 6270.30  | 1589515.40 |
| 2018-06-26 05:30:00  | 6270.30  | 6286.36  | 6268.78  | 6270.69  | 2859640.60 |
| 2018-06-26 05:45:00  | 6271.75  | 6274.69  | 6259.29  | 6261.20  | 2448639.24 |
+----------------------+----------+----------+----------+----------+------------+

As per my use case, I round 3:45 and 4:00 from 15 mins to 4:00 at 30 mins and so on

agg = { 'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volumeto': 'sum'}
df.resample('30min', closed='right', label='right').agg(agg)

Original 15 mins data

30 mins after rounding

As you notice in the 30 mins timeframe, the first candle is generated from only 1 15 min candle instead of 2.

How do I skip incomplete candles only at the beginning of my computation instead of the end? I would like to similary skip for 1 hr etc Thanks

回答1:

IIUC, you could do it this way. Add a count to your aggregation dict, then filter your dataframe based on the count, like this:

agg = { 'open': ['first','count'], 'high': 'max', 'low': 'min', 'close': 'last', 'volumeto': 'sum'}
df_out = df.resample('30min', closed='right', label='right').agg(agg)
df_out.columns = df_out.columns.map('_'.join)
df_out[df_out['open_count']>1]

Output:

                     open_first  open_count  high_max  low_min  close_last  volumeto_sum
time                                                                                    
2018-06-26 04:00:00     6247.34           2   6257.61  6238.11     6239.83    4700074.31
2018-06-26 04:30:00     6239.83           2   6269.14  6232.14     6260.96    5775722.04
2018-06-26 05:00:00     6261.16           2   6284.27  6249.46     6263.73    7368807.50
2018-06-26 05:30:00     6263.73           2   6286.36  6263.59     6270.69    4449156.00

Or to eliminate just the first incomplete group

df_out[~(df_out['open_count'] < 2).cumprod().astype(bool)]

Output:

                     open_first  open_count  high_max  low_min  close_last  volumeto_sum
time                                                                                    
2018-06-26 04:00:00     6247.34           2   6257.61  6238.11     6239.83    4700074.31
2018-06-26 04:30:00     6239.83           2   6269.14  6232.14     6260.96    5775722.04
2018-06-26 05:00:00     6261.16           2   6284.27  6249.46     6263.73    7368807.50
2018-06-26 05:30:00     6263.73           2   6286.36  6263.59     6270.69    4449156.00
2018-06-26 06:00:00     6271.75           1   6274.69  6259.29     6261.20    2448639.24

来源：https://stackoverflow.com/questions/51063353/pandas-resample-skip-incomplete-groups-at-the-start

标签

python

pandas

resampling