The question is pretty much in the title: Is there an efficient way to count the distinct values in every column in a DataFrame?
The describe method provides only th
if you just want to count for particular column then following could help. Although its late answer. it might help someone. (pyspark 2.2.0 tested)
pyspark 2.2.0
from pyspark.sql.functions import col, countDistinct df.agg(countDistinct(col("colName")).alias("count")).show()