What is the rule/process when a function is called with pandas apply() through lambda vs. not? Examples below. Without lambda apparently, the entire series ( df[column name] ) is passed to the "test" function which throws an error trying to do a boolean operation on a series.
If the same function is called via lambda it works. Iteration over each row with each passed as "x" and the df[ column name ] returns a single value for that column in the current row.
It's like lambda is removing a dimension. Anyone have an explanation or point to the specific doc on this? Thanks.
Example 1 with lambda, works OK
print("probPredDF columns:", probPredDF.columns) def test( x, y): if x==y: r = 'equal' else: r = 'not equal' return r probPredDF.apply( lambda x: test( x['yTest'], x[ 'yPred']), axis=1 ).head() Example 1 output
probPredDF columns: Index([0, 1, 'yPred', 'yTest'], dtype='object') Out[215]: 0 equal 1 equal 2 equal 3 equal 4 equal dtype: object Example 2 without lambda, throws boolean operation on series error
print("probPredDF columns:", probPredDF.columns) def test( x, y): if x==y: r = 'equal' else: r = 'not equal' return r probPredDF.apply( test( probPredDF['yTest'], probPredDF[ 'yPred']), axis=1 ).head() Example 2 output
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().