I am working with PySpark dataframes here. \"test1\" is my PySpark dataframe and event_date is a TimestampType. So when I try to get a distinct count of event_date, the resu
Using collect()
collect()
import pyspark.sql.functions as sf distinct_count = df.agg(sf.countDistinct('column_name')).collect()[0][0]
Using first()
import pyspark.sql.functions as sf distinct_count = df.agg(sf.countDistinct('column_name')).first()[0]