I\'m generating a number of dataframes with the same shape, and I want to compare them to one another. I want to be able to get the mean and median across the dataframes.
You can simply assign a label to each frame, call it group and then concat and groupby to do what you want:
In [57]: df = DataFrame(np.random.randn(10, 4), columns=list('abcd'))
In [58]: df2 = df.copy()
In [59]: dfs = [df, df2]
In [60]: df
Out[60]:
a b c d
0 0.1959 0.1260 0.1464 0.1631
1 0.9344 -1.8154 1.4529 -0.6334
2 0.0390 0.4810 1.1779 -1.1799
3 0.3542 0.3819 -2.0895 0.8877
4 -2.2898 -1.0585 0.8083 -0.2126
5 0.3727 -0.6867 -1.3440 -1.4849
6 -1.1785 0.0885 1.0945 -1.6271
7 -1.7169 0.3760 -1.4078 0.8994
8 0.0508 0.4891 0.0274 -0.6369
9 -0.7019 1.0425 -0.5476 -0.5143
In [61]: for i, d in enumerate(dfs):
....: d['group'] = i
....:
In [62]: dfs[0]
Out[62]:
a b c d group
0 0.1959 0.1260 0.1464 0.1631 0
1 0.9344 -1.8154 1.4529 -0.6334 0
2 0.0390 0.4810 1.1779 -1.1799 0
3 0.3542 0.3819 -2.0895 0.8877 0
4 -2.2898 -1.0585 0.8083 -0.2126 0
5 0.3727 -0.6867 -1.3440 -1.4849 0
6 -1.1785 0.0885 1.0945 -1.6271 0
7 -1.7169 0.3760 -1.4078 0.8994 0
8 0.0508 0.4891 0.0274 -0.6369 0
9 -0.7019 1.0425 -0.5476 -0.5143 0
In [63]: final = pd.concat(dfs, ignore_index=True)
In [64]: final
Out[64]:
a b c d group
0 0.1959 0.1260 0.1464 0.1631 0
1 0.9344 -1.8154 1.4529 -0.6334 0
2 0.0390 0.4810 1.1779 -1.1799 0
3 0.3542 0.3819 -2.0895 0.8877 0
4 -2.2898 -1.0585 0.8083 -0.2126 0
5 0.3727 -0.6867 -1.3440 -1.4849 0
6 -1.1785 0.0885 1.0945 -1.6271 0
.. ... ... ... ... ...
13 0.3542 0.3819 -2.0895 0.8877 1
14 -2.2898 -1.0585 0.8083 -0.2126 1
15 0.3727 -0.6867 -1.3440 -1.4849 1
16 -1.1785 0.0885 1.0945 -1.6271 1
17 -1.7169 0.3760 -1.4078 0.8994 1
18 0.0508 0.4891 0.0274 -0.6369 1
19 -0.7019 1.0425 -0.5476 -0.5143 1
[20 rows x 5 columns]
In [65]: final.groupby('group').mean()
Out[65]:
a b c d
group
0 -0.394 -0.0576 -0.0682 -0.4339
1 -0.394 -0.0576 -0.0682 -0.4339
Here, each group is the same, but that's only because df == df2.
Alternatively, you can throw the frames into a Panel:
In [69]: df = DataFrame(np.random.randn(10, 4), columns=list('abcd'))
In [70]: df2 = DataFrame(np.random.randn(10, 4), columns=list('abcd'))
In [71]: panel = pd.Panel({0: df, 1: df2})
In [72]: panel
Out[72]:
Dimensions: 2 (items) x 10 (major_axis) x 4 (minor_axis)
Items axis: 0 to 1
Major_axis axis: 0 to 9
Minor_axis axis: a to d
In [73]: panel.mean()
Out[73]:
0 1
a 0.3839 0.2956
b 0.1855 -0.3164
c -0.1167 -0.0627
d -0.2338 -0.0450