Count number of non-NA values for every column in a dataframe

后端未结

关注

 3  1762

I have a big dataset that contains a lot of NAs and some non-Na values. At the moment I count my non-NA values for each column like this:

相关标签:

3条回答

感动是毒

2020-12-22 09:47
Try this:
```
nonNA_counts <- sapply(df, function(x) sum(!is.na(x)))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

刺人心

2020-12-22 09:56

You can also call is.na on the entire data frame (implicitly coercing to a logical matrix) and call colSums on the inverted response:

# make sample data
set.seed(47)
df <- as.data.frame(matrix(sample(c(0:1, NA), 100*5, TRUE), 100))

str(df)
#> 'data.frame':    100 obs. of  5 variables:
#>  $ V1: int  NA 1 NA NA 1 NA 1 1 1 NA ...
#>  $ V2: int  NA NA NA 1 NA 1 0 1 0 NA ...
#>  $ V3: int  1 1 0 1 1 NA NA 1 NA NA ...
#>  $ V4: int  NA 0 NA 0 0 NA 1 1 NA NA ...
#>  $ V5: int  NA NA NA 0 0 0 0 0 NA NA ...

colSums(!is.na(df))
#> V1 V2 V3 V4 V5 
#> 69 55 62 60 70

0 讨论(0)

孤城傲影

2020-12-22 09:57
With dplyr, that would be:
```
library(dplyr)

df %>%
summarise_all(funs(sum(!is.na(.)))
```
The advantage of that approach is that you can use group_by before, and that you don't need to care about column names (it just summarizes all of them).
0 讨论(0)
发布评论:

提交评论
- 加载中...