Spark DataFrame: count distinct values of every column

前端 未结 5 2099
野的像风
野的像风 2020-11-27 04:48

The question is pretty much in the title: Is there an efficient way to count the distinct values in every column in a DataFrame?

The describe method provides only th

5条回答
  •  没有蜡笔的小新
    2020-11-27 05:33

    if you just want to count for particular column then following could help. Although its late answer. it might help someone. (pyspark 2.2.0 tested)

    from pyspark.sql.functions import col, countDistinct
    df.agg(countDistinct(col("colName")).alias("count")).show()
    

提交回复
热议问题