发表新帖

发表新帖

Spark scala remove columns containing only null values

前端未结

关注

 3  1922

南笙 2021-01-12 13:47

Is there a way to remove the columns of a spark dataFrame that contain only null values ? (I am using scala and Spark 1.6.2)

At the moment I am doing this:

3条回答

旧时难觅i (楼主)

2021-01-12 14:48
I had the same problem and i came up with a similar solution in Java. In my opinion there is no other way of doing it at the moment.
```
for (String column:df.columns()){
    long count = df.select(column).distinct().count();

    if(count == 1 && df.select(column).first().isNullAt(0)){
        df = df.drop(column);
    }
}
```
I'm dropping all columns containing exactly one distinct value and which first value is null. This way I can be sure that i don't drop columns where all values are the same but not null.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题