Get the mean across multiple Pandas DataFrames

后端 未结 5 884
礼貌的吻别
礼貌的吻别 2020-12-04 15:39

I\'m generating a number of dataframes with the same shape, and I want to compare them to one another. I want to be able to get the mean and median across the dataframes.

5条回答
  •  眼角桃花
    2020-12-04 16:11

    You can simply assign a label to each frame, call it group and then concat and groupby to do what you want:

    In [57]: df = DataFrame(np.random.randn(10, 4), columns=list('abcd'))
    
    In [58]: df2 = df.copy()
    
    In [59]: dfs = [df, df2]
    
    In [60]: df
    Out[60]:
            a       b       c       d
    0  0.1959  0.1260  0.1464  0.1631
    1  0.9344 -1.8154  1.4529 -0.6334
    2  0.0390  0.4810  1.1779 -1.1799
    3  0.3542  0.3819 -2.0895  0.8877
    4 -2.2898 -1.0585  0.8083 -0.2126
    5  0.3727 -0.6867 -1.3440 -1.4849
    6 -1.1785  0.0885  1.0945 -1.6271
    7 -1.7169  0.3760 -1.4078  0.8994
    8  0.0508  0.4891  0.0274 -0.6369
    9 -0.7019  1.0425 -0.5476 -0.5143
    
    In [61]: for i, d in enumerate(dfs):
       ....:     d['group'] = i
       ....:
    
    In [62]: dfs[0]
    Out[62]:
            a       b       c       d  group
    0  0.1959  0.1260  0.1464  0.1631      0
    1  0.9344 -1.8154  1.4529 -0.6334      0
    2  0.0390  0.4810  1.1779 -1.1799      0
    3  0.3542  0.3819 -2.0895  0.8877      0
    4 -2.2898 -1.0585  0.8083 -0.2126      0
    5  0.3727 -0.6867 -1.3440 -1.4849      0
    6 -1.1785  0.0885  1.0945 -1.6271      0
    7 -1.7169  0.3760 -1.4078  0.8994      0
    8  0.0508  0.4891  0.0274 -0.6369      0
    9 -0.7019  1.0425 -0.5476 -0.5143      0
    
    In [63]: final = pd.concat(dfs, ignore_index=True)
    
    In [64]: final
    Out[64]:
             a       b       c       d  group
    0   0.1959  0.1260  0.1464  0.1631      0
    1   0.9344 -1.8154  1.4529 -0.6334      0
    2   0.0390  0.4810  1.1779 -1.1799      0
    3   0.3542  0.3819 -2.0895  0.8877      0
    4  -2.2898 -1.0585  0.8083 -0.2126      0
    5   0.3727 -0.6867 -1.3440 -1.4849      0
    6  -1.1785  0.0885  1.0945 -1.6271      0
    ..     ...     ...     ...     ...    ...
    13  0.3542  0.3819 -2.0895  0.8877      1
    14 -2.2898 -1.0585  0.8083 -0.2126      1
    15  0.3727 -0.6867 -1.3440 -1.4849      1
    16 -1.1785  0.0885  1.0945 -1.6271      1
    17 -1.7169  0.3760 -1.4078  0.8994      1
    18  0.0508  0.4891  0.0274 -0.6369      1
    19 -0.7019  1.0425 -0.5476 -0.5143      1
    
    [20 rows x 5 columns]
    
    In [65]: final.groupby('group').mean()
    Out[65]:
               a       b       c       d
    group
    0     -0.394 -0.0576 -0.0682 -0.4339
    1     -0.394 -0.0576 -0.0682 -0.4339
    

    Here, each group is the same, but that's only because df == df2.

    Alternatively, you can throw the frames into a Panel:

    In [69]: df = DataFrame(np.random.randn(10, 4), columns=list('abcd'))
    
    In [70]: df2 = DataFrame(np.random.randn(10, 4), columns=list('abcd'))
    
    In [71]: panel = pd.Panel({0: df, 1: df2})
    
    In [72]: panel
    Out[72]:
    
    Dimensions: 2 (items) x 10 (major_axis) x 4 (minor_axis)
    Items axis: 0 to 1
    Major_axis axis: 0 to 9
    Minor_axis axis: a to d
    
    In [73]: panel.mean()
    Out[73]:
            0       1
    a  0.3839  0.2956
    b  0.1855 -0.3164
    c -0.1167 -0.0627
    d -0.2338 -0.0450
    

提交回复
热议问题