PySpark: TypeError: condition should be string or Column

后端 未结 3 1130
天命终不由人
天命终不由人 2020-12-17 10:58

I am trying to filter an RDD based like below:

spark_df = sc.createDataFrame(pandas_df)
spark_df.filter(lambda r: str(         


        
3条回答
  •  遥遥无期
    2020-12-17 11:28

    convert the dataframe into rdd.

    spark_df = sc.createDataFrame(pandas_df)
    spark_df.rdd.filter(lambda r: str(r['target']).startswith('good'))
    spark_df.take(5)
    

    I think it may work!

提交回复
热议问题