I have this dataframe:
set.seed(50)
data <- data.frame(age=c(rep(\"juv\", 10), rep(\"ad\", 10)),
sex=c(rep(\"m\", 10), rep(\"f\", 10)),
Use aggregate
:
nacheck <- function(var, factor)
aggregate(var, list(factor), function(x) sum(is.na(x)))
nacheck(data$length, data$age)
nacheck(data$length, data$sex)
nacheck(data$length, data$size)
You could also apply
this to your dataframe, by each factor to get NA
counts for all of the dimension measures for each factor.
apply(data[,c("length","width","height")], 2, nacheck, factor=data$age)
apply(data[,c("length","width","height")], 2, nacheck, factor=data$sex)
apply(data[,c("length","width","height")], 2, nacheck, factor=data$size)
To do this all as one function, nest nacheck
in something and then lapply
:
exploreNA <- function(df, factors){
nacheck <- function(var, factor)
aggregate(var, list(factor), function(x) sum(is.na(x)))
lapply(factors, function(x) apply(df, 2, nacheck, factor=x))
}
exploreNA(data[,c("length","width","height")], list(data$age, data$sex, data$size))
Looking for something like this...???
library(doBy)
summaryBy(length+width+height~age+sex+size,
data=data,
FUN=function(x) sum(is.na(x)),
keep.names=TRUE)
age sex size length width height
1 ad f small 3 4 4
2 juv m large 5 4 4
A data.table
approach:
library(data.table)
DT <- data.table(data)
DT[, lapply(.SD, function(x) sum(is.na(x))) , by = list(age,sex,size)]
## age sex size length width height
## 1: juv m large 5 4 4
## 2: ad f small 3 4 4
and the plyr
equivalent using colwise
and ddply
ddply(data, .(age,sex,size), colwise(.fun = function(x) sum(is.na(x))))
## age sex size length width height
## 1 ad f small 3 4 4
## 2 juv m large 5 4 4
You could always use a vector of column names for the by
components
by.cols <- c('age', 'sex' ,'size')
# then the following will work....
DT[, lapply(.SD, function(x) sum(is.na(x))), by = by.cols]
ddply(data, by.cols, colwise(.fun = function(x) sum(is.na(x))))