Filtering a Pyspark DataFrame with SQL-like IN clause

前端 未结 5 1649
清酒与你
清酒与你 2020-11-27 02:54

I want to filter a Pyspark DataFrame with a SQL-like IN clause, as in

sc = SparkContext()
sqlc = SQLContext(sc)
df = sqlc.sql(\'SELECT * from my         


        
5条回答
  •  伪装坚强ぢ
    2020-11-27 03:52

    Just a little addition/update:

    choice_list = ["foo", "bar", "jack", "joan"]
    

    If you want to filter your dataframe "df", such that you want to keep rows based upon a column "v" taking only the values from choice_list, then

    df_filtered = df.where( ( col("v").isin (choice_list) ) )
    

提交回复
热议问题