Spark scala remove columns containing only null values

前端 未结 3 1915
南笙
南笙 2021-01-12 13:47

Is there a way to remove the columns of a spark dataFrame that contain only null values ? (I am using scala and Spark 1.6.2)

At the moment I am doing this:



        
3条回答
  •  旧时难觅i
    2021-01-12 14:48

    I had the same problem and i came up with a similar solution in Java. In my opinion there is no other way of doing it at the moment.

    for (String column:df.columns()){
        long count = df.select(column).distinct().count();
    
        if(count == 1 && df.select(column).first().isNullAt(0)){
            df = df.drop(column);
        }
    }
    

    I'm dropping all columns containing exactly one distinct value and which first value is null. This way I can be sure that i don't drop columns where all values are the same but not null.

提交回复
热议问题