Get mean of multiple selected columns in a pandas dataframe

我只是一个虾纸丫 提交于 2021-01-29 10:33:19

问题


I want to calculate the mean of all the values in selected columns in a dataframe. For example, I have a dataframe with columns A, B, C, D and E and I want the mean of all the values in columns A, C and E.

import pandas as pd

df1 = pd.DataFrame( ( {'A': [1,2,3,4,5],
                      'B': [10,20,30,40,50],
                      'C': [11,21,31,41,51],
                      'D': [12,22,32,42,52],
                      'E': [13,23,33,43,53]} ) )

print( df1 )

print( "Mean of df1:", df1.mean() )

df2 = pd.concat( [df1['A'], df1['C'], df1['E'] ], ignore_index=True )
print( df2 )
print( "Mean of df2:", df2.mean() )

df3 = pd.DataFrame()
df3 = pd.concat( [ df3, df1['A'] ], ignore_index=True )
df3 = pd.concat( [ df3, df1['C'] ], ignore_index=True )
df3 = pd.concat( [ df3, df1['E'] ], ignore_index=True )
print( df3 )
print( "Mean of df3:", df3.mean() )

df2 gets me the right answer, but I need to create a new dataframe to get it.

I though something like df1['A', 'C', 'E'].mean() would work but it returns the mean values for each column, not the combined average. Is there a way to do this without creating a new dataframe? I also need other data statistics like .std(), .min(), max() so this isn't just a one-off calculation.


回答1:


You have two options that I know of:

for mean(), min(), max() you can use mean of mean, min of min, max of max this would yield, mean, min, max of all the elements of A, C, E.

So you can use: for mean():enter code here

df1[['A','C','E']].apply(np.mean).mean()
df1[['A','C','E']].values.mean() 

Any one of the above should give you the mean of all the elements of columns A, C, E.

for min():

df1[['A','C','E']].apply(np.min).min()
df1[['A','C','E']].values.min()  

For max():

df1[['A','C','E']].apply(np.max).max()
df1[['A','C','E']].values.max() 

For std()

df1[['A','C','E']].apply(np.std).std()    ##  this will not give error, but gives a 
                       value that is not what you want.
df1[['A','C','E']].values.std()    # this gives the std of all the elements of columns A, C, E.

std of std will not give the std of all the elements.




回答2:


You can reshape DataFrame to Series with Multiindex by DataFrame.stack and then use mean:

df2 = df1[['A', 'C', 'E']].stack()
print (df2)
0  A     1
   C    11
   E    13
1  A     2
   C    21
   E    23
2  A     3
   C    31
   E    33
3  A     4
   C    41
   E    43
4  A     5
   C    51
   E    53
dtype: int64

print( "Mean of df2:", df2.mean() )
Mean of df2: 22.333333333333332

Another idea is convert values to numpy 2d array and then use np.mean:

df21 = df1[['A', 'C', 'E']]
print( df21 )
   A   C   E
0  1  11  13
1  2  21  23
2  3  31  33
3  4  41  43
4  5  51  53

print(df21.to_numpy())
[[ 1 11 13]
 [ 2 21 23]
 [ 3 31 33]
 [ 4 41 43]
 [ 5 51 53]]

print( "Mean of df2:", np.mean(df21.to_numpy()) )
Mean of df2: 22.333333333333332



回答3:


Caveat: only okay if the columns are of the same length. If not it would give the wrong answer (as the comments pointed out).

mean = df1[['A', 'C', 'E']].mean(axis=1).mean()    
print(mean)


来源:https://stackoverflow.com/questions/61426161/get-mean-of-multiple-selected-columns-in-a-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!