Please suggest pyspark dataframe alternative for Pandas df[\'col\'].unique().
df[\'col\'].unique()
I want to list out all the unique values in a pyspark dataframe column.
you could do
distinct_column = 'somecol' distinct_column_vals = df.select(distinct_column).distinct().collect() distinct_column_vals = [v[distinct_column] for v in distinct_column_vals]