Randomly insert NAs into dataframe proportionaly

后端 未结 6 1445
無奈伤痛
無奈伤痛 2020-11-29 12:07

I have a complete dataframe. I want to 20% of the values in the dataframe to be replaced by NAs to simulate random missing data.

A <- c(1:10)
B <- c(1         


        
6条回答
  •  执笔经年
    2020-11-29 12:39

    If you are in the mood to use purrr instead of lapply, you can also do it like this:

    > library(purrr)
    > df <- data.frame(A = 1:10, B = 11:20, C = 21:30)
    > df
        A  B  C
    1   1 11 21
    2   2 12 22
    3   3 13 23
    4   4 14 24
    5   5 15 25
    6   6 16 26
    7   7 17 27
    8   8 18 28
    9   9 19 29
    10 10 20 30
    > map_df(df, function(x) {x[sample(c(TRUE, NA), prob = c(0.8, 0.2), size = length(x), replace = TRUE)]})
    # A tibble: 10 x 3
           A     B     C
         
    1      1    11    21
    2      2    12    22
    3     NA    13    NA
    4      4    14    NA
    5      5    15    25
    6      6    16    26
    7      7    17    27
    8      8    NA    28
    9      9    19    29
    10    10    20    30
    

提交回复
热议问题