问题
Possible Duplicate:
R - remove rows with NAs in data.frame
I have a dataframe named sub.new with multiple columns in it. And I'm trying to exclude any cell containing NA
or a blank space
"".
I tried to use subset()
, but it's targeting specific column conditional. Is there anyway to scan through the whole dataframe and create a subset that no cell is either NA
or blank space
?
In the example below, only the first line should be kept:
# ID SNP ILMN_Strand Customer_Strand
ID1234 [A/G] TOP BOT
Non-Specific NSB (Bgnd) Green
Non-Polymorphic NP (A) Red
Non-Polymorphic NP (T) Purple
Non-Polymorphic NP (C) Green
Non-Polymorphic NP (G) Blue
Restoration Restore Green
Any suggestions? Thanks
回答1:
A good idea is to set all of the "" (blank cells) to NA before any further analysis.
If you are reading your input from a file, it is a good choice to cast all "" to NAs:
foo <- read.table(file="Your_file.txt", na.strings=c("", "NA"), sep="\t") # if your file is tab delimited
If you have already your table loaded, you can act as follows:
foo[foo==""] <- NA
Then to keep only rows with no NA you may just use na.omit()
:
foo <- na.omit(foo)
Or to keep columns with no NA:
foo <- foo[, colSums(is.na(foo)) == 0]
回答2:
Don't know exactly what kind of dataset you have, so I provide general answer.
x <- c(1,2,NA,3,4,5)
y <- c(1,2,3,NA,6,8)
my.data <- data.frame(x, y)
> my.data
x y
1 1 1
2 2 2
3 NA 3
4 3 NA
5 4 6
6 5 8
# Exclude rows with NA values
my.data[complete.cases(my.data),]
x y
1 1 1
2 2 2
5 4 6
6 5 8
来源:https://stackoverflow.com/questions/12763890/exclude-blank-and-na-in-r