Spark DataFrame: count distinct values of every column

前端 未结 5 2048
野的像风
野的像风 2020-11-27 04:48

The question is pretty much in the title: Is there an efficient way to count the distinct values in every column in a DataFrame?

The describe method provides only th

5条回答
  •  渐次进展
    2020-11-27 05:50

    You can use the count(column name) function of SQL

    Alternatively if you are using data analysis and want a rough estimation and not exact count of each and every column you can use approx_count_distinct function approx_count_distinct(expr[, relativeSD])

提交回复
热议问题