how to create a group ID based on 5 minutes interval in pandas timeseries?

前端 未结 2 1067
北海茫月
北海茫月 2020-12-08 15:37

I have a timeseries dataframe df looks like this (the time seris happen within same day, but across different hours:

                                   


        
2条回答
  •  误落风尘
    2020-12-08 16:40

    You can use the TimeGrouper function in a groupy/apply. With a TimeGrouper you don't need to create your period column. I know you're not trying to compute the mean but I will use it as an example:

    >>> df.groupby(pd.TimeGrouper('5Min'))['val'].mean()
    
    time
    2014-04-03 16:00:00    14390.000000
    2014-04-03 16:05:00    14394.333333
    2014-04-03 16:10:00    14396.500000
    

    Or an example with an explicit apply:

    >>> df.groupby(pd.TimeGrouper('5Min'))['val'].apply(lambda x: len(x) > 3)
    
    time
    2014-04-03 16:00:00    False
    2014-04-03 16:05:00    False
    2014-04-03 16:10:00     True
    

    Doctstring for TimeGrouper:

    Docstring for resample:class TimeGrouper@21
    
    TimeGrouper(self, freq = 'Min', closed = None, label = None,
    how = 'mean', nperiods = None, axis = 0, fill_method = None,
    limit = None, loffset = None, kind = None, convention = None, base = 0,
    **kwargs)
    
    Custom groupby class for time-interval grouping
    
    Parameters
    ----------
    freq : pandas date offset or offset alias for identifying bin edges
    closed : closed end of interval; left or right
    label : interval boundary to use for labeling; left or right
    nperiods : optional, integer
    convention : {'start', 'end', 'e', 's'}
        If axis is PeriodIndex
    
    Notes
    -----
    Use begin, end, nperiods to generate intervals that cannot be derived
    directly from the associated object
    

    Edit

    I don't know of an elegant way to create the period column, but the following will work:

    >>> new = df.groupby(pd.TimeGrouper('5Min'),as_index=False).apply(lambda x: x['val'])
    >>> df['period'] = new.index.get_level_values(0)
    >>> df
    
                         id    val  period
    time
    2014-04-03 16:01:53  23  14389       0
    2014-04-03 16:01:54  28  14391       0 
    2014-04-03 16:05:55  24  14393       1
    2014-04-03 16:06:25  23  14395       1
    2014-04-03 16:07:01  23  14395       1
    2014-04-03 16:10:09  23  14395       2
    2014-04-03 16:10:23  26  14397       2
    2014-04-03 16:10:57  26  14397       2
    2014-04-03 16:11:10  26  14397       2
    

    It works because the groupby here with as_index=False actually returns the period column you want as the part of the multiindex and I just grab that part of the multiindex and assign to a new column in the orginal dataframe. You could do anything in the apply, I just want the index:

    >>> new
    
       time
    0  2014-04-03 16:01:53    14389
       2014-04-03 16:01:54    14391
    1  2014-04-03 16:05:55    14393
       2014-04-03 16:06:25    14395
       2014-04-03 16:07:01    14395
    2  2014-04-03 16:10:09    14395
       2014-04-03 16:10:23    14397
       2014-04-03 16:10:57    14397
       2014-04-03 16:11:10    14397
    
    >>>  new.index.get_level_values(0)
    
    Int64Index([0, 0, 1, 1, 1, 2, 2, 2, 2], dtype='int64')
    

提交回复
热议问题