Delete columns/rows with more than x% missing

前端 未结 2 1496
孤街浪徒
孤街浪徒 2020-11-27 05:36

I want to delete all columns or rows with more than 50% NAs in a data frame.

This is my solution:

# delete columns with more than 50% mi         


        
相关标签:
2条回答
  • 2020-11-27 05:45

    A tidyverse solution that removes columns with an x% of NAs(50%) here:

    test_data <- data.frame(A=c(rep(NA,12),
                                520,233,522),
                            B = c(rep(10,12),
                                  520,233,522))
    # Remove all with %NA >= 50
    # can just use >50
    
    
     test_data %>% 
      purrr::discard(~sum(is.na(.x))/length(.x)* 100 >=50)
    

    Result:

         B
    1   10
    2   10
    3   10
    4   10
    5   10
    6   10
    7   10
    8   10
    9   10
    10  10
    11  10
    12  10
    13 520
    14 233
    15 522
    
    0 讨论(0)
  • 2020-11-27 06:06

    To remove columns with some amount of NA, you can use colMeans(is.na(...))

    ## Some sample data
    set.seed(0)
    dat <- matrix(1:100, 10, 10)
    dat[sample(1:100, 50)] <- NA
    dat <- data.frame(dat)
    
    ## Remove columns with more than 50% NA
    dat[, which(colMeans(!is.na(dat)) > 0.5)]
    
    ## Remove rows with more than 50% NA
    dat[which(rowMeans(!is.na(dat)) > 0.5), ]
    
    ## Remove columns and rows with more than 50% NA
    dat[which(rowMeans(!is.na(dat)) > 0.5), which(colMeans(!is.na(dat)) > 0.5)]
    
    0 讨论(0)
提交回复
热议问题