I have a multi-year time series an want the bounds between which 95% of my data lie. I want to look at this by season of the year (\'DJF\', \'MAM\', \'JJA\', \'SON\').
<
The fastest so far is a combination of creating a low-frequency timeseries with which to do the season lookup and @Garrett's method of using a numpy.array index lookup rather than a dict.
season_lookup = np.array([
None,
'DJF', 'DJF',
'MAM', 'MAM', 'MAM',
'JJA', 'JJA', 'JJA',
'SON', 'SON', 'SON',
'DJF'])
SEASON_HALO = pd.datetools.relativedelta(months=4)
start_with_halo = df.index.min() - SEASON_HALO
end_with_halo = df.index.max() + SEASON_HALO
seasonal_idx = pd.DatetimeIndex(start=start_with_halo, end=end_with_halo, freq='QS-DEC')
seasonal_ts = pd.DataFrame(index=seasonal_idx)
seasonal_ts[SEAS] = season_lookup[seasonal_ts.index.month]
seasonal_minutely_ts = seasonal_ts.resample(df.index.freq, fill_method='ffill')
df_via_resample = df.join(seasonal_minutely_ts)
gp_up_sample = df_via_resample.groupby(SEAS)
gp_up_sample.quantile(FRAC_2_TAIL)
with 10 years of minute data, on my machine: this is about:
dict lookup then up-samplenp.array lookupYMMV