Store aggregate value of a PySpark dataframe column into a variable

前端 未结 6 877
故里飘歌
故里飘歌 2021-01-13 09:37

I am working with PySpark dataframes here. \"test1\" is my PySpark dataframe and event_date is a TimestampType. So when I try to get a distinct count of event_date, the resu

6条回答
  •  天命终不由人
    2021-01-13 09:59

    Using collect()

    import pyspark.sql.functions as sf
    
    
    distinct_count = df.agg(sf.countDistinct('column_name')).collect()[0][0]
    

    Using first()

    import pyspark.sql.functions as sf
    
    
    distinct_count = df.agg(sf.countDistinct('column_name')).first()[0]
    

提交回复
热议问题