问题
For example, I have a pandas dataframe as follows:
col_1 col_2 col_3 col_4
a X 5 1
a Y 3 2
a Z 6 4
b X 7 8
b Y 4 3
b Z 6 5
And I want to, for each value in col_1, add the values in col_3 and col_4 (and many more columns) that correspond to X and Z from col_2 and create a new row with these values. So the output would be as below:
col_1 col_2 col_3 col_4
a X 5 1
a Y 3 2
a Z 6 4
a NEW 11 5
b X 7 8
b Y 4 3
b Z 6 5
b NEW 13 13
Also, there could be more values in col_1 that will need the same treatment, so I can't explicitly reference 'a' and 'b'. I attempted to use a combination of groupby('col_1') and apply(), but I couldn't get it to work. I'm close enough with the below, but I can't get it to put 'NEW' in col_2 and to keep the original value (a or b, etc.) in col_1.
df.append(df[(df['col_2'] == 'X') | (df['col_2'] == 'Z')].groupby('col_1').mean())
Thanks.
回答1:
If you can guarantee that X
and Z
appear only once in a group, you can use a groupby
and pd.concat
operation:
new = df[df.col_2.isin(['X', 'Z'])]\
.groupby(['col_1'], as_index=False).sum()\
.assign(col_2='NEW')
df = pd.concat([df, new]).sort_values('col_1')
df
col_1 col_2 col_3 col_4
0 a X 5 1
1 a Y 3 2
2 a Z 6 4
0 a NEW 11 5
3 b X 7 8
4 b Y 4 3
5 b Z 6 5
1 b NEW 13 13
回答2:
The following code does it:
import pandas as pd
def sum_group(df):
dfxz = df[df.col_2.isin(['X','Z'])]
sum_row = pd.Series(
[
df.col_1.iloc[0],
'NEW',
dfxz.col_3.sum(),
dfxz.col_4.sum()
], index = dfxz.columns)
return df.append(sum_row, ignore_index=True)
df = pd.DataFrame([['a', 'X', 5, 1],
['a', 'Y', 3, 2],
['a', 'Z', 6, 4],
['b', 'X', 7, 8],
['b', 'Y', 4, 3],
['b', 'Z', 6, 5]],
columns = ['col_1','col_2','col_3','col_4'])
df = df.groupby('col_1').apply(
sum_group,
).reset_index(drop=True)
print df
The apply
method of the groupby
object calls the function sum_group
that returns a dataframe. The dataframes are then concatenated into a single dataframe. The sum_group
concatenates the incoming dataframe with an additional row sum_row
that contain the reduced version of the dataframe according to the criteria you stated.
来源:https://stackoverflow.com/questions/46446863/insert-rows-as-a-result-of-a-groupby-operation-into-the-original-dataframe