PySpark: multiple conditions in when clause

后端 未结 4 2224
一生所求
一生所求 2020-12-01 00:46

I would like to modify the cell values of a dataframe column (Age) where currently it is blank and I would only do it if another column (Survived) has the value 0 for the c

4条回答
  •  长情又很酷
    2020-12-01 01:17

    You get SyntaxError error exception because Python has no && operator. It has and and & where the latter one is the correct choice to create boolean expressions on Column (| for a logical disjunction and ~ for logical negation).

    Condition you created is also invalid because it doesn't consider operator precedence. & in Python has a higher precedence than == so expression has to be parenthesized.

    (col("Age") == "") & (col("Survived") == "0")
    ## Column
    

    On a side note when function is equivalent to case expression not WHEN clause. Still the same rules apply. Conjunction:

    df.where((col("foo") > 0) & (col("bar") < 0))
    

    Disjunction:

    df.where((col("foo") > 0) | (col("bar") < 0))
    

    You can of course define conditions separately to avoid brackets:

    cond1 = col("Age") == "" 
    cond2 = col("Survived") == "0"
    
    cond1 & cond2
    

提交回复
热议问题