Difference between asfreq and resample

后端 未结 2 1096
萌比男神i
萌比男神i 2020-12-24 05:39

Can some please explain the difference between the asfreq and resample methods in pandas? When should one use what?

相关标签:
2条回答
  • 2020-12-24 05:56

    Let me use an example to illustrate:

    # generate a series of 365 days
    # index = 20190101, 20190102, ... 20191231
    # values = [0,1,...364]
    ts = pd.Series(range(365), index = pd.date_range(start='20190101', 
                                                    end='20191231',
                                                    freq = 'D'))
    ts.head()
    
    output:
    2019-01-01    0
    2019-01-02    1
    2019-01-03    2
    2019-01-04    3
    2019-01-05    4
    Freq: D, dtype: int64
    

    Now, resample the data by quarter:

    ts.asfreq(freq='Q')
    
    output:
    2019-03-31     89
    2019-06-30    180
    2019-09-30    272
    2019-12-31    364
    Freq: Q-DEC, dtype: int64
    

    The asfreq() returns a Series object with the last day of each quarter in it.

    ts.resample('Q')
    
    output:
    DatetimeIndexResampler [freq=<QuarterEnd: startingMonth=12>, axis=0, closed=right, label=right, convention=start, base=0]
    

    Resample returns a DatetimeIndexResampler and you cannot see what's actually inside. Think of it as the groupby method. It creates a list of bins (groups):

    bins = ts.resample('Q')
    bin.groups
    
    output:
     {Timestamp('2019-03-31 00:00:00', freq='Q-DEC'): 90,
     Timestamp('2019-06-30 00:00:00', freq='Q-DEC'): 181,
     Timestamp('2019-09-30 00:00:00', freq='Q-DEC'): 273,
     Timestamp('2019-12-31 00:00:00', freq='Q-DEC'): 365}
    

    Nothing seems different so far except for the return type. Let's calculate the average of each quarter:

    # (89+180+272+364)/4 = 226.25
    ts.asfreq(freq='Q').mean()
    
    output:
    226.25
    

    When mean() is applied, it outputs the average of all the values. Note that this is not the average of each quarter, but the average of the last day of each quarter.

    To calculate the average of each quarter:

    ts.resample('Q').mean()
    
    output:
    2019-03-31     44.5
    2019-06-30    135.0
    2019-09-30    226.5
    2019-12-31    318.5
    

    You can perform more powerful operations with resample() than asfreq().

    Think of resample as groupby + every method that you can call after groupby (e.g. mean, sum, apply, you name it) .

    Think of asfreq as a filter mechanism with limited fillna() capabilities (in fillna(), you can specify limit, but asfreq() does not support it).

    0 讨论(0)
  • 2020-12-24 06:07

    resample is more general than asfreq. For example, using resample I can pass an arbitrary function to perform binning over a Series or DataFrame object in bins of arbitrary size. asfreq is a concise way of changing the frequency of a DatetimeIndex object. It also provides padding functionality.

    As the pandas documentation says, asfreq is a thin wrapper around a call to date_range + a call to reindex. See here for an example.

    An example of resample that I use in my daily work is computing the number of spikes of a neuron in 1 second bins by resampling a large boolean array where True means "spike" and False means "no spike". I can do that as easy as large_bool.resample('S', how='sum'). Kind of neat!

    asfreq can be used when you want to change a DatetimeIndex to have a different frequency while retaining the same values at the current index.

    Here's an example where they are equivalent:

    In [6]: dr = date_range('1/1/2010', periods=3, freq=3 * datetools.bday)
    
    In [7]: raw = randn(3)
    
    In [8]: ts = Series(raw, index=dr)
    
    In [9]: ts
    Out[9]:
    2010-01-01   -1.948
    2010-01-06    0.112
    2010-01-11   -0.117
    Freq: 3B, dtype: float64
    
    In [10]: ts.asfreq(datetools.BDay())
    Out[10]:
    2010-01-01   -1.948
    2010-01-04      NaN
    2010-01-05      NaN
    2010-01-06    0.112
    2010-01-07      NaN
    2010-01-08      NaN
    2010-01-11   -0.117
    Freq: B, dtype: float64
    
    In [11]: ts.resample(datetools.BDay())
    Out[11]:
    2010-01-01   -1.948
    2010-01-04      NaN
    2010-01-05      NaN
    2010-01-06    0.112
    2010-01-07      NaN
    2010-01-08      NaN
    2010-01-11   -0.117
    Freq: B, dtype: float64
    

    As far as when to use either: it depends on the problem you have in mind...care to share?

    0 讨论(0)
提交回复
热议问题