Is there a way to remove the columns of a spark dataFrame that contain only null values ? (I am using scala and Spark 1.6.2)
At the moment I am doing this:
Here's a scala example to remove null columns that only queries that data once (faster):
def removeNullColumns(df:DataFrame): DataFrame = {
var dfNoNulls = df
val exprs = df.columns.map((_ -> "count")).toMap
val cnts = df.agg(exprs).first
for(c <- df.columns) {
val uses = cnts.getAs[Long]("count("+c+")")
if ( uses == 0 ) {
dfNoNulls = dfNoNulls.drop(c)
}
}
return dfNoNulls
}