问题
What is the rule/process when a function is called with pandas apply()
through lambda vs. not? Examples below. Without lambda apparently, the entire series ( df[column name] ) is passed to the "test" function which throws an error trying to do a boolean operation on a series.
If the same function is called via lambda it works. Iteration over each row with each passed as "x" and the df[ column name ] returns a single value for that column in the current row.
It's like lambda is removing a dimension. Anyone have an explanation or point to the specific doc on this? Thanks.
Example 1 with lambda, works OK
print("probPredDF columns:", probPredDF.columns)
def test( x, y):
if x==y:
r = 'equal'
else:
r = 'not equal'
return r
probPredDF.apply( lambda x: test( x['yTest'], x[ 'yPred']), axis=1 ).head()
Example 1 output
probPredDF columns: Index([0, 1, 'yPred', 'yTest'], dtype='object')
Out[215]:
0 equal
1 equal
2 equal
3 equal
4 equal
dtype: object
Example 2 without lambda, throws boolean operation on series error
print("probPredDF columns:", probPredDF.columns)
def test( x, y):
if x==y:
r = 'equal'
else:
r = 'not equal'
return r
probPredDF.apply( test( probPredDF['yTest'], probPredDF[ 'yPred']), axis=1 ).head()
Example 2 output
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
回答1:
There is nothing magic about a lambda
. They are functions in one parameter, that can be defined inline, and do not have a name. You can use a function where a lambda is expected, but the function will need to also take one parameter. You need to do something like...
Define it as:
def wrapper(x):
return test(x['yTest'], x['yPred'])
Use it as:
probPredDF.apply(wrapper, axis=1)
来源:https://stackoverflow.com/questions/43810094/pandas-apply-with-and-without-lambda