Pandas efficient groupby season for every year

后端未结

关注

 2  673

甜味超标 2020-12-12 02:57

I have a multi-year time series an want the bounds between which 95% of my data lie. I want to look at this by season of the year (\'DJF\', \'MAM\', \'JJA\', \'SON\').

2条回答

遥遥无期 (楼主)

2020-12-12 03:18

In case it helps, I would suggest replacing the following list comprehension and dict lookup that you identified as slow:

month_to_season_dct = {
    1: 'DJF', 2: 'DJF',
    3: 'MAM', 4: 'MAM', 5: 'MAM',
    6: 'JJA', 7: 'JJA', 8: 'JJA',
    9: 'SON', 10: 'SON', 11: 'SON',
    12: 'DJF'
}
grp_ary = [month_to_season_dct.get(t_stamp.month) for t_stamp in df.index]

with the following, which uses a numpy array as a lookup table.

month_to_season_lu = np.array([
    None,
    'DJF', 'DJF',
    'MAM', 'MAM', 'MAM',
    'JJA', 'JJA', 'JJA',
    'SON', 'SON', 'SON',
    'DJF'
])
grp_ary = month_to_season_lu[df.index.month]

Here's a timeit comparison of the two approaches on ~3 years of minutely data:

In [16]: timeit [month_to_season_dct.get(t_stamp.month) for t_stamp in df.index]
1 loops, best of 3: 12.3 s per loop

In [17]: timeit month_to_season_lu[df.index.month]
1 loops, best of 3: 549 ms per loop

0 讨论(0)

查看其它2个回答