What's the equivalent of cut/qcut for pandas date fields?

后端未结

关注

 4  1787

Happy的楠姐 2020-12-31 20:13

Update: starting with version 0.20.0, pandas cut/qcut DOES handle date fields. See What\'s New for more.

pd.cut and pd.qcut now sup

4条回答

清歌不尽 (楼主)

2020-12-31 20:59

I came up with an idea that relies on the underlying storage format of datetime64[ns]. If you define dcut() like this

def dcut(dts, freq='d', right=True):
    hi = pd.Period(dts.max(), freq=freq) + 1   # get first period past end of data
    periods = pd.PeriodIndex(start=dts.min(), end=hi, freq=freq)
    # get a list of integer bin boundaries representing ns-since-epoch
    # note the extra period gives us the extra right-hand bin boundary we need
    bounds = np.array(periods.to_timestamp(how='start'), dtype='int')
    # bin our time field as integers
    cut = pd.cut(np.array(dts, dtype='int'), bins=bounds, right=right)
    # relabel the bins using the periods, omitting the extra one at the end
    cut.levels = periods[:-1].format()
    return cut

Then we can do what I wanted:

df.groupby([dcut(df.recd, freq='m', right=False),dcut(df.ship, freq='m', right=False)]).count()

To get:

                price qty recd ship
2012-07 2012-10   1    1    1    1
2012-11 2012-12   1    1    1    1
        2013-03   1    1    1    1  
2012-12 2012-09   1    1    1    1
        2013-02   1    1    1    1  
2013-01 2012-08   1    1    1    1
2013-02 2013-02   1    1    1    1
2013-03 2013-03   1    1    1    1
2013-04 2012-07   1    1    1    1
        2013-03   1    1    1    1

I guess you could similarly define dqcut() which first "rounds" each datetime value to the integer representing the start of its containing period (at your specified frequency), and then uses qcut() to choose amongst those boundaries. Or do qcut() first on the raw integer values and round the resulting bins based on your chosen frequency?

No joy on the bonus question yet? :)

0 讨论(0)

查看其它4个回答