How to filter column on values in list in pyspark?

后端 未结 1 2155
温柔的废话
温柔的废话 2021-02-19 09:39

I have a dataframe rawdata on which i have to apply filter condition on column X with values CB,CI and CR. So I used the below code:

df = dfRawData.filter(col(\"         


        
相关标签:
1条回答
  • 2021-02-19 10:17

    between is used to check if the value is between two values, the input is a lower bound and an upper bound. It can not be used to check if a column value is in a list. To do that, use isin:

    import pyspark.sql.functions as f
    df = dfRawData.where(f.col("X").isin({"CB", "CI", "CR"}))
    
    0 讨论(0)
提交回复
热议问题