Can some please explain the difference between the asfreq and resample methods in pandas? When should one use what?
Let me use an example to illustrate:
# generate a series of 365 days
# index = 20190101, 20190102, ... 20191231
# values = [0,1,...364]
ts = pd.Series(range(365), index = pd.date_range(start='20190101',
end='20191231',
freq = 'D'))
ts.head()
output:
2019-01-01 0
2019-01-02 1
2019-01-03 2
2019-01-04 3
2019-01-05 4
Freq: D, dtype: int64
Now, resample the data by quarter:
ts.asfreq(freq='Q')
output:
2019-03-31 89
2019-06-30 180
2019-09-30 272
2019-12-31 364
Freq: Q-DEC, dtype: int64
The asfreq() returns a Series object with the last day of each quarter in it.
ts.resample('Q')
output:
DatetimeIndexResampler [freq=, axis=0, closed=right, label=right, convention=start, base=0]
Resample returns a DatetimeIndexResampler and you cannot see what's actually inside. Think of it as the groupby method. It creates a list of bins (groups):
bins = ts.resample('Q')
bin.groups
output:
{Timestamp('2019-03-31 00:00:00', freq='Q-DEC'): 90,
Timestamp('2019-06-30 00:00:00', freq='Q-DEC'): 181,
Timestamp('2019-09-30 00:00:00', freq='Q-DEC'): 273,
Timestamp('2019-12-31 00:00:00', freq='Q-DEC'): 365}
Nothing seems different so far except for the return type. Let's calculate the average of each quarter:
# (89+180+272+364)/4 = 226.25
ts.asfreq(freq='Q').mean()
output:
226.25
When mean() is applied, it outputs the average of all the values. Note that this is not the average of each quarter, but the average of the last day of each quarter.
To calculate the average of each quarter:
ts.resample('Q').mean()
output:
2019-03-31 44.5
2019-06-30 135.0
2019-09-30 226.5
2019-12-31 318.5
You can perform more powerful operations with resample() than asfreq().
Think of resample as groupby + every method that you can call after groupby (e.g. mean, sum, apply, you name it) .
Think of asfreq as a filter mechanism with limited fillna() capabilities (in fillna(), you can specify limit, but asfreq() does not support it).