Store aggregate value of a PySpark dataframe column into a variable

前端未结

关注

 6  877

故里飘歌 2021-01-13 09:37

I am working with PySpark dataframes here. \"test1\" is my PySpark dataframe and event_date is a TimestampType. So when I try to get a distinct count of event_date, the resu

6条回答

天命终不由人 (楼主)

2021-01-13 09:59

Using collect()

import pyspark.sql.functions as sf


distinct_count = df.agg(sf.countDistinct('column_name')).collect()[0][0]

Using first()

import pyspark.sql.functions as sf


distinct_count = df.agg(sf.countDistinct('column_name')).first()[0]

0 讨论(0)

查看其它6个回答