Update: starting with version 0.20.0, pandas cut/qcut DOES handle date fields. See What\'s New for more.
pd.cut and pd.qcut now sup
I came up with an idea that relies on the underlying storage format of datetime64[ns]. If you define dcut() like this
def dcut(dts, freq='d', right=True):
hi = pd.Period(dts.max(), freq=freq) + 1 # get first period past end of data
periods = pd.PeriodIndex(start=dts.min(), end=hi, freq=freq)
# get a list of integer bin boundaries representing ns-since-epoch
# note the extra period gives us the extra right-hand bin boundary we need
bounds = np.array(periods.to_timestamp(how='start'), dtype='int')
# bin our time field as integers
cut = pd.cut(np.array(dts, dtype='int'), bins=bounds, right=right)
# relabel the bins using the periods, omitting the extra one at the end
cut.levels = periods[:-1].format()
return cut
Then we can do what I wanted:
df.groupby([dcut(df.recd, freq='m', right=False),dcut(df.ship, freq='m', right=False)]).count()
To get:
price qty recd ship
2012-07 2012-10 1 1 1 1
2012-11 2012-12 1 1 1 1
2013-03 1 1 1 1
2012-12 2012-09 1 1 1 1
2013-02 1 1 1 1
2013-01 2012-08 1 1 1 1
2013-02 2013-02 1 1 1 1
2013-03 2013-03 1 1 1 1
2013-04 2012-07 1 1 1 1
2013-03 1 1 1 1
I guess you could similarly define dqcut() which first "rounds" each datetime value to the integer representing the start of its containing period (at your specified frequency), and then uses qcut() to choose amongst those boundaries. Or do qcut() first on the raw integer values and round the resulting bins based on your chosen frequency?
No joy on the bonus question yet? :)