I want to filter a Pyspark DataFrame with a SQL-like IN clause, as in
IN
sc = SparkContext() sqlc = SQLContext(sc) df = sqlc.sql(\'SELECT * from my
Just a little addition/update:
choice_list = ["foo", "bar", "jack", "joan"]
If you want to filter your dataframe "df", such that you want to keep rows based upon a column "v" taking only the values from choice_list, then
df_filtered = df.where( ( col("v").isin (choice_list) ) )