Add column to the end of Pandas DataFrame containing average of previous data

爱⌒轻易说出口 提交于 2020-08-22 04:21:05

问题


I have a DataFrame ave_data that contains the following:

ave_data

Time        F7           F8            F9  
00:00:00    43.005593    -56.509746    25.271271  
01:00:00    55.114918    -59.173852    31.849262  
02:00:00    63.990762    -64.699492    52.426017

I want to add another column to this dataframe, containing the average of the values under column F7, F8 and F9 for each row.

The ave_data DataFrame might change size as my code reads from different Excel files later, so the method needs to be generic (i.e add the column containing the average always as the last column in the DataFrame, not in column number 4)

desired output

Time        F7           F8            F9           Average
00:00:00    43.005593    -56.509746    25.271271    4.25  
01:00:00    55.114918    -59.173852    31.849262    9.26
02:00:00    63.990762    -64.699492    52.426017    17.24

回答1:


You can take a copy of your df using copy() and then just call mean and pass params axis=1 and numeric_only=True so that the mean is calculated row-wise and to ignore non-numeric columns, when you do the following the column is always added at the end:

In [68]:

summary_ave_data = df.copy()
summary_ave_data['average'] = summary_ave_data.mean(numeric_only=True, axis=1)
summary_ave_data
Out[68]:
                 Time         F7         F8         F9    average
0 2015-07-29 00:00:00  43.005593 -56.509746  25.271271   3.922373
1 2015-07-29 01:00:00  55.114918 -59.173852  31.849262   9.263443
2 2015-07-29 02:00:00  63.990762 -64.699492  52.426017  17.239096



回答2:


@LaangeHaare or anyone else who is curious, I just tested it and the copy part of the accepted answer seems unnecessary (maybe I am missing something...)

so you could simplify this with:

df['average'] = df.mean(numeric_only=True, axis=1)

I would have simply added this as a comment but don't have the reputation




回答3:


In common case if you would like to use specific columns, you can use:

df['average'] = df[['F7','F8']].mean(axis=1)

where axis=1 stands for rowwise action (using column values for each row to calculate the mean in 'average' column)

Then you may want to sort by this column:

df.sort_values(by='average',ascending=False, inplace=True)

where inplace=True stands for applying action to dataframe instead of calculating on the copy.




回答4:


df.assign is specifically for this purpose. It returns a copy to avoid changing the original dataframe and/or raising SettingWithCopyWarning. It works as follows:

data_with_ave = ave_data.assign(average = ave_data.mean(axis=1, numeric_only=True))

This function can also create multiple columns at the same time:

data_with_ave = ave_data.assign(
                    average = ave_data.mean(axis=1, numeric_only=True),
                    median = ave_data.median(axis=1, numeric_only=True)
)

As of pandas 0.36, you can even reference a column just created to create another:

data_with_ave = ave_data.assign(
                    average = ave_data.mean(axis=1, numeric_only=True),
                    isLarge = lambda df: df['average'] > 10
)


来源:https://stackoverflow.com/questions/31698861/add-column-to-the-end-of-pandas-dataframe-containing-average-of-previous-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!