问题
I have the following dataframe
X Y
0 A 10
1 A 9
2 A 8
3 A 5
4 B 100
5 B 90
6 B 80
7 B 50
and two different functions that are very similar
def func1(x):
if x.iloc[0]['X'] == 'A':
x['D'] = 1
else:
x['D'] = 0
return x[['X', 'D']]
def func2(x):
if x.iloc[0]['X'] == 'A':
x['D'] = 'u'
else:
x['D'] = 'v'
return x[['X', 'D']]
Now I can groupby/apply these functions
df.groupby('X').apply(func1)
df.groupby('X').apply(func2)
The first line gives me what I want, i.e.
X D
0 A 1
1 A 1
2 A 1
3 A 1
4 B 0
5 B 0
6 B 0
7 B 0
But the second line returns something quite strange
X D
0 A u
1 A u
2 A u
3 A u
4 A u
5 A u
6 A u
7 A u
So my questions are:
- Can anybody explain why the behavior of groupby/apply is different when the type changes?
- How can I get something similar with
func2
?
回答1:
The problem is simply that a function applied to a GroupBy should never try to change the dataframe it receives. It is implementation dependant whether it is a copy (that can safely be changed but changes will not be seen in original dataframe) or a view. The choice is done by pandas optimizer, and as a user, you should just know that it is forbidden.
The correct way is to force a copy:
def func2(x):
x = x.copy()
if x.iloc[0]['X'] == 'A':
x['D'] = 'u'
else:
x['D'] = 'v'
return x[['X', 'D']]
After that, df.groupby('X').apply(func2).reset_index(level=0, drop=True)
gives as expected:
X D
0 A u
1 A u
2 A u
3 A u
4 B v
5 B v
6 B v
7 B v
来源:https://stackoverflow.com/questions/56961451/pandas-groupby-apply-has-different-behaviour-with-int-and-string-types