What's the equivalent of cut/qcut for pandas date fields?

后端 未结 4 1787
Happy的楠姐
Happy的楠姐 2020-12-31 20:13

Update: starting with version 0.20.0, pandas cut/qcut DOES handle date fields. See What\'s New for more.

pd.cut and pd.qcut now sup

4条回答
  •  清歌不尽
    2020-12-31 20:59

    I came up with an idea that relies on the underlying storage format of datetime64[ns]. If you define dcut() like this

    def dcut(dts, freq='d', right=True):
        hi = pd.Period(dts.max(), freq=freq) + 1   # get first period past end of data
        periods = pd.PeriodIndex(start=dts.min(), end=hi, freq=freq)
        # get a list of integer bin boundaries representing ns-since-epoch
        # note the extra period gives us the extra right-hand bin boundary we need
        bounds = np.array(periods.to_timestamp(how='start'), dtype='int')
        # bin our time field as integers
        cut = pd.cut(np.array(dts, dtype='int'), bins=bounds, right=right)
        # relabel the bins using the periods, omitting the extra one at the end
        cut.levels = periods[:-1].format()
        return cut
    

    Then we can do what I wanted:

    df.groupby([dcut(df.recd, freq='m', right=False),dcut(df.ship, freq='m', right=False)]).count()
    

    To get:

                    price qty recd ship
    2012-07 2012-10   1    1    1    1
    2012-11 2012-12   1    1    1    1
            2013-03   1    1    1    1  
    2012-12 2012-09   1    1    1    1
            2013-02   1    1    1    1  
    2013-01 2012-08   1    1    1    1
    2013-02 2013-02   1    1    1    1
    2013-03 2013-03   1    1    1    1
    2013-04 2012-07   1    1    1    1
            2013-03   1    1    1    1  
    

    I guess you could similarly define dqcut() which first "rounds" each datetime value to the integer representing the start of its containing period (at your specified frequency), and then uses qcut() to choose amongst those boundaries. Or do qcut() first on the raw integer values and round the resulting bins based on your chosen frequency?

    No joy on the bonus question yet? :)

提交回复
热议问题