I have a big dataset that contains a lot of NA
s and some non-Na values.
At the moment I count my non-NA
values for each column like this:
Try this:
nonNA_counts <- sapply(df, function(x) sum(!is.na(x)))
You can also call is.na
on the entire data frame (implicitly coercing to a logical matrix) and call colSums
on the inverted response:
# make sample data
set.seed(47)
df <- as.data.frame(matrix(sample(c(0:1, NA), 100*5, TRUE), 100))
str(df)
#> 'data.frame': 100 obs. of 5 variables:
#> $ V1: int NA 1 NA NA 1 NA 1 1 1 NA ...
#> $ V2: int NA NA NA 1 NA 1 0 1 0 NA ...
#> $ V3: int 1 1 0 1 1 NA NA 1 NA NA ...
#> $ V4: int NA 0 NA 0 0 NA 1 1 NA NA ...
#> $ V5: int NA NA NA 0 0 0 0 0 NA NA ...
colSums(!is.na(df))
#> V1 V2 V3 V4 V5
#> 69 55 62 60 70
With dplyr
, that would be:
library(dplyr)
df %>%
summarise_all(funs(sum(!is.na(.)))
The advantage of that approach is that you can use group_by
before, and that you don't need to care about column names (it just summarizes all of them).