How to delete rows from a dataframe that contain n*NA

旧巷老猫 提交于 2019-11-26 13:49:16

Use rowSums. To remove rows from a data frame (df) that contain precisely n NA values:

df <- df[rowSums(is.na(df)) != n, ]

or to remove rows that contain n or more NA values:

df <- df[rowSums(is.na(df)) < n, ]

in both cases of course replacing n with the number that's required

If dat is the name of your data.frame the following will return what you're looking for:

keep <- rowSums(is.na(dat)) < 2
dat <- dat[keep, ] 

What this is doing:

is.na(dat) 
# returns a matrix of T/F
# note that when adding logicals 
# T == 1, and F == 0

rowSums(.)
# quickly computes the total per row 
# since your task is to identify the
# rows with a certain number of NA's 

rowSums(.) < 2 
# for each row, determine if the sum 
# (which is the number of NAs) is less
# than 2 or not.  Returns T/F accordingly 

We use the output of this last statement to identify which rows to keep. Note that it is not necessary to actually store this last logical.

If d is your data frame, try this:

d <- d[rowSums(is.na(d)) < 2,]

This will return a dataset where at most two values per row are missing:

dfrm[ apply(dfrm, 1, function(r) sum(is.na(x)) <= 2 ) , ]
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!