Resampling a pandas dataframe with multi-index containing timeseries

前端 未结 2 667
一个人的身影
一个人的身影 2020-12-19 08:52

apologies from creating what appears to be a duplicate of this question. I have a dataframe that is shaped more or less like the one below:

df_lenght = 240
d         


        
2条回答
  •  鱼传尺愫
    2020-12-19 09:27

    IIUC, you wish to group by job_id and (daily) datetimes, and wish to ignore the first level of the DataFrame index. Therefore, instead of grouping by

    ( [ level_values(i) for i in [0,1] ] + [ pd.Grouper(freq='D', level=2) ] )
    

    you'd want to groupby

    [df.index.get_level_values(1), pd.Grouper(freq='D', level=2)]
    

    import numpy as np
    import pandas as pd
    np.random.seed(2017)
    
    df_length = 240
    df = pd.DataFrame(np.random.randn(df_length,2), columns=['a','b'] )
    df['datetime'] = pd.date_range('23/06/2017', periods=df_length, freq='H')
    
    unique_jobs = ['job1','job2','job3',]
    job_id = [unique_jobs for i in range (1, int((df_length/len(unique_jobs))+1) ,1) ]
    df['job_id'] = sorted( [val for sublist in job_id for val in sublist] )
    
    df.set_index(['job_id','datetime'], append=True, inplace=True)
    
    grouped = df.groupby([df.index.get_level_values(1), pd.Grouper(freq='D', level=2)])
    result = grouped.mean().rolling(window=2).mean()
    
    print(result)
    

    yields

                              a         b
    job_id datetime                      
    job1   2017-06-23       NaN       NaN
           2017-06-24 -0.203083  0.176141
           2017-06-25 -0.077083  0.072510
           2017-06-26 -0.237611 -0.493329
    job2   2017-06-26 -0.297775 -0.370543
           2017-06-27  0.005124  0.052603
           2017-06-28  0.226142 -0.015584
           2017-06-29 -0.065595  0.210628
    job3   2017-06-29 -0.186865  0.347683
           2017-06-30  0.051508  0.029909
           2017-07-01  0.005341  0.075378
           2017-07-02 -0.027131  0.132192
    

提交回复
热议问题