PySpark: TypeError: condition should be string or Column

后端 未结 3 1122
天命终不由人
天命终不由人 2020-12-17 10:58

I am trying to filter an RDD based like below:

spark_df = sc.createDataFrame(pandas_df)
spark_df.filter(lambda r: str(         


        
3条回答
  •  无人及你
    2020-12-17 11:40

    I have been through this and have settled to using a UDF:

    from pyspark.sql.functions import udf
    from pyspark.sql.types import BooleanType
    
    filtered_df = spark_df.filter(udf(lambda target: target.startswith('good'), 
                                      BooleanType())(spark_df.target))
    

    More readable would be to use a normal function definition instead of the lambda

提交回复
热议问题