show distinct column values in pyspark dataframe: python

后端 未结 9 857
忘了有多久
忘了有多久 2020-12-23 10:55

Please suggest pyspark dataframe alternative for Pandas df[\'col\'].unique().

I want to list out all the unique values in a pyspark dataframe column.

9条回答
  •  粉色の甜心
    2020-12-23 11:27

    If you want to see the distinct values of a specific column in your dataframe , you would just need to write -

        df.select('colname').distinct().show(100,False)
    

    This would show the 100 distinct values (if 100 values are available) for the colname column in the df dataframe.

    If you want to do something fancy on the distinct values, you can save the distinct values in a vector

        a = df.select('colname').distinct()
    

    Here, a would have all the distinct values of the column colname

提交回复
热议问题