Please suggest pyspark dataframe alternative for Pandas df[\'col\'].unique().
df[\'col\'].unique()
I want to list out all the unique values in a pyspark dataframe column.
collect_set can help to get unique values from a given column of pyspark.sql.DataFrame df.select(F.collect_set("column").alias("column")).first()["column"]
df.select(F.collect_set("column").alias("column")).first()["column"]