For example, I have a pandas dataframe as follows:
col_1 col_2 col_3 col_4
a X 5 1
a Y 3 2
a Z 6 4
If you can guarantee that X
and Z
appear only once in a group, you can use a groupby
and pd.concat
operation:
new = df[df.col_2.isin(['X', 'Z'])]\
.groupby(['col_1'], as_index=False).sum()\
.assign(col_2='NEW')
df = pd.concat([df, new]).sort_values('col_1')
df
col_1 col_2 col_3 col_4
0 a X 5 1
1 a Y 3 2
2 a Z 6 4
0 a NEW 11 5
3 b X 7 8
4 b Y 4 3
5 b Z 6 5
1 b NEW 13 13
The following code does it:
import pandas as pd
def sum_group(df):
dfxz = df[df.col_2.isin(['X','Z'])]
sum_row = pd.Series(
[
df.col_1.iloc[0],
'NEW',
dfxz.col_3.sum(),
dfxz.col_4.sum()
], index = dfxz.columns)
return df.append(sum_row, ignore_index=True)
df = pd.DataFrame([['a', 'X', 5, 1],
['a', 'Y', 3, 2],
['a', 'Z', 6, 4],
['b', 'X', 7, 8],
['b', 'Y', 4, 3],
['b', 'Z', 6, 5]],
columns = ['col_1','col_2','col_3','col_4'])
df = df.groupby('col_1').apply(
sum_group,
).reset_index(drop=True)
print df
The apply
method of the groupby
object calls the function sum_group
that returns a dataframe. The dataframes are then concatenated into a single dataframe. The sum_group
concatenates the incoming dataframe with an additional row sum_row
that contain the reduced version of the dataframe according to the criteria you stated.