I want to filter a Pyspark DataFrame with a SQL-like IN clause, as in
IN
sc = SparkContext() sqlc = SQLContext(sc) df = sqlc.sql(\'SELECT * from my
reiterating what @zero323 has mentioned above : we can do the same thing using a list as well (not only set) like below
set
from pyspark.sql.functions import col df.where(col("v").isin(["foo", "bar"])).count()