问题
Would it be possible to mutate DataFrame inplace with groupby
statement?
import pandas as pd
dt = pd.DataFrame({
"LETTER": ["a", "b", "c", "a", "b"],
"VALUE" : [10 , 12 , 13, 0, 15]
})
def __add_new_col(dt_):
dt_['NEW_COL'] = dt_['VALUE'] - dt_['VALUE'].mean()
return dt_
pass
dt.groupby("LETTER").apply(__add_new_col)
LETTER VALUE NEW_COL
0 a 10 5.0
1 b 12 -1.5
2 c 13 0.0
3 a 0 -5.0
4 b 15 1.5
dt
LETTER VALUE
0 a 10
1 b 12
2 c 13
3 a 0
4 b 15
In R data.table it is possible by using :=
operator e.g. dt[, col := ... , by ='LETTER']
回答1:
I think you can use transform which return Series
same length and same index as df
with substracting:
print (dt.groupby("LETTER")['VALUE'].transform('mean'))
0 5.0
1 13.5
2 13.0
3 5.0
4 13.5
Name: VALUE, dtype: float64
dt['NEW_COL'] = dt['VALUE'] - dt.groupby("LETTER")['VALUE'].transform('mean')
print (dt)
LETTER VALUE NEW_COL
0 a 10 5.0
1 b 12 -1.5
2 c 13 0.0
3 a 0 -5.0
4 b 15 1.5
回答2:
I'm quite sure you can't mutate the dataframe during a group by. You can do exactly the same operation mapping every lettering with it's mean and then perform the operation.
df['NEW_COL'] = df['VALUE'] - df['LETTER'].map(dt.groupby("LETTER")['VALUE'].mean()).values
This will deal with any possible ordering issue, which I wouldn't trust to be guarantee even if tested. Better safe than sorry :)
Also, I'm using .values accessor after the map because I'm not sure what the index of the "mapped" series will be the same of the 'VALUE' series, which sometime will result with NaN.
来源:https://stackoverflow.com/questions/42225830/inplace-transformation-pandas-with-groupby