How to delete columns that contain ONLY NAs?

五迷三道 提交于 2019-11-27 06:35:35
MadScone

One way of doing it:

df[, colSums(is.na(df)) != nrow(df)]

If the count of NAs in a column is equal to the number of rows, it must be entirely NA.

Or similarly

df[colSums(!is.na(df)) > 0]

Here is a dplyr solution:

df %>% select_if(~sum(!is.na(.)) > 0)

It seeems like you want to remove ONLY columns with ALL NAs, leaving columns with some rows that do have NAs. I would do this (but I am sure there is an efficient vectorised soution:

#set seed for reproducibility
set.seed <- 103
df <- data.frame( id = 1:10 , nas = rep( NA , 10 ) , vals = sample( c( 1:3 , NA ) , 10 , repl = TRUE ) )
df
#      id nas vals
#   1   1  NA   NA
#   2   2  NA    2
#   3   3  NA    1
#   4   4  NA    2
#   5   5  NA    2
#   6   6  NA    3
#   7   7  NA    2
#   8   8  NA    3
#   9   9  NA    3
#   10 10  NA    2

#Use this command to remove columns that are entirely NA values, it will elave columns where only some vlaues are NA
df[ , ! apply( df , 2 , function(x) all(is.na(x)) ) ]
#      id vals
#   1   1   NA
#   2   2    2
#   3   3    1
#   4   4    2
#   5   5    2
#   6   6    3
#   7   7    2
#   8   8    3
#   9   9    3
#   10 10    2

If you find yourself in the situation where you want to remove columns that have any NA values you can simply change the all command above to any.

Another option is the janitor package:

df <- remove_empty_cols(df)

https://github.com/sfirke/janitor

An intuitive script: dplyr::select_if(~!all(is.na(.))). It literally keeps only not-all-elements-missing columns. (to delete all-element-missing columns).

> df <- data.frame( id = 1:10 , nas = rep( NA , 10 ) , vals = sample( c( 1:3 , NA ) , 10 , repl = TRUE ) )

> df %>% glimpse()
Observations: 10
Variables: 3
$ id   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
$ nas  <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
$ vals <int> NA, 1, 1, NA, 1, 1, 1, 2, 3, NA

> df %>% select_if(~!all(is.na(.))) 
   id vals
1   1   NA
2   2    1
3   3    1
4   4   NA
5   5    1
6   6    1
7   7    1
8   8    2
9   9    3
10 10   NA

Another option with Filter

Filter(function(x) !all(is.na(x)), df)

NOTE: Data from @Simon O'Hanlon's post.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!