Randomly insert NAs into dataframe proportionaly

后端 未结 6 1441
無奈伤痛
無奈伤痛 2020-11-29 12:07

I have a complete dataframe. I want to 20% of the values in the dataframe to be replaced by NAs to simulate random missing data.

A <- c(1:10)
B <- c(1         


        
6条回答
  •  离开以前
    2020-11-29 12:56

    df <- data.frame(A = 1:10, B = 11:20, c = 21:30)
    head(df)
    ##   A  B  c
    ## 1 1 11 21
    ## 2 2 12 22
    ## 3 3 13 23
    ## 4 4 14 24
    ## 5 5 15 25
    ## 6 6 16 26
    
    as.data.frame(lapply(df, function(cc) cc[ sample(c(TRUE, NA), prob = c(0.85, 0.15), size = length(cc), replace = TRUE) ]))
    ##     A  B  c
    ## 1   1 11 21
    ## 2   2 12 22
    ## 3   3 13 23
    ## 4   4 14 24
    ## 5   5 NA 25
    ## 6   6 16 26
    ## 7  NA 17 27
    ## 8   8 18 28
    ## 9   9 19 29
    ## 10 10 20 30
    

    It's a random process, so it might not give 15% every time.

提交回复
热议问题