How to resample a dataframe with different functions applied to each column?

后端 未结 4 1361
旧巷少年郎
旧巷少年郎 2020-12-07 16:04

I have a times series with temperature and radiation in a pandas dataframe. The time resolution is 1 minute in regular steps.

import datetime
im         


        
4条回答
  •  臣服心动
    2020-12-07 16:51

    You need to use groupby as such:

    grouped = frame.groupby(lambda x: x.hour)
    grouped.agg({'radiation': np.sum, 'tamb': np.mean})
    # Same as: grouped.agg({'radiation': 'sum', 'tamb': 'mean'})
    

    with the output being:

            radiation      tamb
    key_0                      
    8      298.581107  4.883806
    9      311.176148  4.983705
    10     315.531527  5.343057
    11     288.013876  6.022002
    12       5.527616  8.507670
    

    So in essence I am splitting on the hour value and then calculating the mean of tamb and the sum of radiation and returning back the DataFrame (similar approach to R's ddply). For more info I would check the documentation page for groupby as well as this blog post.

    Edit: To make this scale a bit better you could group on both the day and time as such:

    grouped = frame.groupby(lambda x: (x.day, x.hour))
    grouped.agg({'radiation': 'sum', 'tamb': 'mean'})
              radiation      tamb
    key_0                        
    (5, 8)   298.581107  4.883806
    (5, 9)   311.176148  4.983705
    (5, 10)  315.531527  5.343057
    (5, 11)  288.013876  6.022002
    (5, 12)    5.527616  8.507670
    

提交回复
热议问题