How to find count of Null and Nan values for each column in a PySpark dataframe efficiently?

后端 未结 5 2034
广开言路
广开言路 2020-11-28 21:35
import numpy as np

df = spark.createDataFrame(
    [(1, 1, None), (1, 2, float(5)), (1, 3, np.nan), (1, 4, None), (1, 5, float(10)), (1, 6, float(\'nan\')), (1, 6,          


        
5条回答
  •  春和景丽
    2020-11-28 21:57

    An alternative to the already provided ways is to simply filter on the column like so

    df = df.where(F.col('columnNameHere').isNull())
    

    This has the added benefit that you don't have to add another column to do the filtering and it's quick on larger data sets.

提交回复
热议问题