I want to filter a Pyspark DataFrame with a SQL-like IN clause, as in
IN
sc = SparkContext() sqlc = SQLContext(sc) df = sqlc.sql(\'SELECT * from my
You can also do this for integer columns:
df_filtered = df.filter("field1 in (1,2,3)")
or this for string columns:
df_filtered = df.filter("field1 in ('a','b','c')")