问题
Whole vector is ok and has no NAs:
> summary(data$marks)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 6.00 6.00 6.02 7.00 7.00
> length(data$marks)
[1] 2528
However, when trying to calculate a subset using a criteria I receive lots of NAs:
> summary(data[data$student=="John",]$marks)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.000 6.000 6.000 6.169 7.000 7.000 464
> length(data[data$student=="John",]$marks)
[1] 523
回答1:
I think the problem is that you have missing values for student. As a result, when you subset by student, all the NA values for student end up producing NA for marks when you take your subset. Wrap the subsetting condition in which() to avoid this problem. Here are a few examples that will hopefully clarify what's happening:
# Fake data
set.seed(103)
dat = data.frame(group=rep(LETTERS[1:3], each=3),
value=rnorm(9))
dat$group[1] = NA
dat$value
dat[dat$group=="B", "value"]
dat[which(dat$group=="B"), "value"]
# Simpler example
x = c(10,20,30,40, NA)
x>20
x[x>20]
which(x>20)
x[which(x>20)]
回答2:
First Note that NA=="foo" results in NA. When subsetting a vector with a NA value the result is NA.
t = c(1,2,3)
t[c(1,NA)]
回答3:
a tidyverse solution. I find these to be easier to read than base R.
library(tidyverse)
data %<%
filter(student == "John") %<%
summary(marks)
来源:https://stackoverflow.com/questions/34055552/na-when-trying-to-summarize-a-subset-of-data-r