问题
I'm trying to apply an if condition over a dataframe, but I'm missing something (error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().)
raw_data = {'age1': [23,45,21],'age2': [10,20,50]}
df = pd.DataFrame(raw_data, columns = ['age1','age2'])
def my_fun (var1,var2,var3):
if (df[var1]-df[var2])>0 :
df[var3]=df[var1]-df[var2]
else:
df[var3]=0
print(df[var3])
my_fun('age1','age2','diff')
回答1:
You can use numpy.where:
def my_fun (var1,var2,var3):
df[var3]= np.where((df[var1]-df[var2])>0, df[var1]-df[var2], 0)
return df
df1 = my_fun('age1','age2','diff')
print (df1)
age1 age2 diff
0 23 10 13
1 45 20 25
2 21 50 0
Error is better explain here.
Slowier solution with apply
, where need axis=1
for data processing by rows:
def my_fun(x, var1, var2, var3):
print (x)
if (x[var1]-x[var2])>0 :
x[var3]=x[var1]-x[var2]
else:
x[var3]=0
return x
print (df.apply(lambda x: my_fun(x, 'age1', 'age2','diff'), axis=1))
age1 age2 diff
0 23 10 13
1 45 20 25
2 21 50 0
Also is possible use loc
, but sometimes data can be overwritten:
def my_fun(x, var1, var2, var3):
print (x)
mask = (x[var1]-x[var2])>0
x.loc[mask, var3] = x[var1]-x[var2]
x.loc[~mask, var3] = 0
return x
print (my_fun(df, 'age1', 'age2','diff'))
age1 age2 diff
0 23 10 13.0
1 45 20 25.0
2 21 50 0.0
回答2:
You can use pandas.Series.where
df.assign(age3=(df.age1 - df.age2).where(df.age1 > df.age2, 0))
age1 age2 age3
0 23 10 13
1 45 20 25
2 21 50 0
You can wrap this in a function
def my_fun(v1, v2):
return v1.sub(v2).where(v1 > v2, 0)
df.assign(age3=my_fun(df.age1, df.age2))
age1 age2 age3
0 23 10 13
1 45 20 25
2 21 50 0
回答3:
There is another way without np.where
or pd.Series.where
. Am not saying it is better, but after trying to adapt this solution to a challenging problem today, was finding the syntax for where
no so intuitive. In the end, not sure whether it would have possible with where, but found the following method lets you have a look at the subset before you modify it and it for me led more quickly to a solution. Works for the OP here of course as well.
You deliberately set a value on a slice of a dataframe as Pandas so often warns you not to.
This answer shows you the correct method to do that.
The following gives you a slice:
df.loc[df['age1'] - df['age2'] > 0]
..which looks like:
age1 age2
0 23 10
1 45 20
Add an extra column to the original dataframe for the values you want to remain after modifying the slice:
df['diff'] = 0
Now modify the slice:
df.loc[df['age1'] - df['age2'] > 0, 'diff'] = df['age1'] - df['age2']
..and the result:
age1 age2 diff
0 23 10 13
1 45 20 25
2 21 50 0
来源:https://stackoverflow.com/questions/43391591/if-else-function-in-pandas-dataframe