How to find count of Null and Nan values for each column in a PySpark dataframe efficiently?

后端 未结 5 2033
广开言路
广开言路 2020-11-28 21:35
import numpy as np

df = spark.createDataFrame(
    [(1, 1, None), (1, 2, float(5)), (1, 3, np.nan), (1, 4, None), (1, 5, float(10)), (1, 6, float(\'nan\')), (1, 6,          


        
5条回答
  •  一整个雨季
    2020-11-28 21:55

    For null values in the dataframe of pyspark

    Dict_Null = {col:df.filter(df[col].isNull()).count() for col in df.columns}
    Dict_Null
    
    # The output in dict where key is column name and value is null values in that column
    
    {'#': 0,
     'Name': 0,
     'Type 1': 0,
     'Type 2': 386,
     'Total': 0,
     'HP': 0,
     'Attack': 0,
     'Defense': 0,
     'Sp_Atk': 0,
     'Sp_Def': 0,
     'Speed': 0,
     'Generation': 0,
     'Legendary': 0}
    

提交回复
热议问题