Spark scala remove columns containing only null values

前端 未结 3 1925
南笙
南笙 2021-01-12 13:47

Is there a way to remove the columns of a spark dataFrame that contain only null values ? (I am using scala and Spark 1.6.2)

At the moment I am doing this:



        
3条回答
  •  暗喜
    暗喜 (楼主)
    2021-01-12 14:43

    Here's a scala example to remove null columns that only queries that data once (faster):

    def removeNullColumns(df:DataFrame): DataFrame = {
        var dfNoNulls = df
        val exprs = df.columns.map((_ -> "count")).toMap
        val cnts = df.agg(exprs).first
        for(c <- df.columns) {
            val uses = cnts.getAs[Long]("count("+c+")")
            if ( uses == 0 ) {
                dfNoNulls = dfNoNulls.drop(c)
            }
        }
        return dfNoNulls
    }
    

提交回复
热议问题