Count number of non-NA values for every column in a dataframe

后端 未结 3 1762
刺人心
刺人心 2020-12-22 09:46

I have a big dataset that contains a lot of NAs and some non-Na values. At the moment I count my non-NA values for each column like this:



        
相关标签:
3条回答
  • 2020-12-22 09:47

    Try this:

    nonNA_counts <- sapply(df, function(x) sum(!is.na(x)))
    
    0 讨论(0)
  • 2020-12-22 09:56

    You can also call is.na on the entire data frame (implicitly coercing to a logical matrix) and call colSums on the inverted response:

    # make sample data
    set.seed(47)
    df <- as.data.frame(matrix(sample(c(0:1, NA), 100*5, TRUE), 100))
    
    str(df)
    #> 'data.frame':    100 obs. of  5 variables:
    #>  $ V1: int  NA 1 NA NA 1 NA 1 1 1 NA ...
    #>  $ V2: int  NA NA NA 1 NA 1 0 1 0 NA ...
    #>  $ V3: int  1 1 0 1 1 NA NA 1 NA NA ...
    #>  $ V4: int  NA 0 NA 0 0 NA 1 1 NA NA ...
    #>  $ V5: int  NA NA NA 0 0 0 0 0 NA NA ...
    
    colSums(!is.na(df))
    #> V1 V2 V3 V4 V5 
    #> 69 55 62 60 70
    
    0 讨论(0)
  • 2020-12-22 09:57

    With dplyr, that would be:

    library(dplyr)
    
    df %>%
    summarise_all(funs(sum(!is.na(.)))
    

    The advantage of that approach is that you can use group_by before, and that you don't need to care about column names (it just summarizes all of them).

    0 讨论(0)
提交回复
热议问题