发表新帖

发表新帖

show distinct column values in pyspark dataframe: python

后端未结

关注

 9  857

忘了有多久 2020-12-23 10:55

Please suggest pyspark dataframe alternative for Pandas df[\'col\'].unique().

I want to list out all the unique values in a pyspark dataframe column.

9条回答

粉色の甜心 (楼主)

2020-12-23 11:27
If you want to see the distinct values of a specific column in your dataframe , you would just need to write -
```
    df.select('colname').distinct().show(100,False)
```
This would show the 100 distinct values (if 100 values are available) for the colname column in the df dataframe.

If you want to do something fancy on the distinct values, you can save the distinct values in a vector
```
    a = df.select('colname').distinct()
```
Here, a would have all the distinct values of the column colname
0 讨论(0)

查看其它9个回答
发布评论:

提交评论
- 加载中...

热议问题