show distinct column values in pyspark dataframe: python

后端 未结 9 889
忘了有多久
忘了有多久 2020-12-23 10:55

Please suggest pyspark dataframe alternative for Pandas df[\'col\'].unique().

I want to list out all the unique values in a pyspark dataframe column.

9条回答
  •  天命终不由人
    2020-12-23 11:42

    you could do

    distinct_column = 'somecol' 
    
    distinct_column_vals = df.select(distinct_column).distinct().collect()
    distinct_column_vals = [v[distinct_column] for v in distinct_column_vals]
    

提交回复
热议问题