Please suggest pyspark dataframe alternative for Pandas df[\'col\'].unique().
df[\'col\'].unique()
I want to list out all the unique values in a pyspark dataframe column.
You can use df.dropDuplicates(['col1','col2']) to get only distinct rows based on colX in the array.
df.dropDuplicates(['col1','col2'])