stratified sampling or proportional sampling in R

前端 未结 1 1623
甜味超标
甜味超标 2021-01-07 07:28

I have a data set generated as follows:

myData <- data.frame(a=1:N,b=round(rnorm(N),2),group=round(rnorm(N,4),0))

The data looks like as

1条回答
  •  春和景丽
    2021-01-07 08:01

    You can use my stratified function, specifying a value < 1 as your proportion, like this:

    ## Sample data. Seed for reproducibility 
    set.seed(1)
    N <- 50
    myData <- data.frame(a=1:N,b=round(rnorm(N),2),group=round(rnorm(N,4),0))
    
    ## Taking the sample
    out <- stratified(myData, "group", .3)
    out
    #     a     b group
    # 17 17 -0.02     2
    # 8   8  0.74     3
    # 25 25  0.62     3
    # 49 49 -0.11     3
    # 4   4  1.60     3
    # 26 26 -0.06     4
    # 27 27 -0.16     4
    # 7   7  0.49     4
    # 12 12  0.39     4
    # 40 40  0.76     4
    # 32 32 -0.10     4
    # 9   9  0.58     5
    # 42 42 -0.25     5
    # 43 43  0.70     5
    # 37 37 -0.39     5
    # 11 11  1.51     6
    

    Compare the counts in the final group with what we would have expected.

    round(table(myData$group) * .3)
    # 
    # 2 3 4 5 6 
    # 1 4 6 4 1 
    table(out$group)
    # 
    # 2 3 4 5 6 
    # 1 4 6 4 1 
    

    You can also easily take a fixed number of samples per group, like this:

    stratified(myData, "group", 2)
    #     a     b group
    # 34 34 -0.05     2
    # 17 17 -0.02     2
    # 49 49 -0.11     3
    # 22 22  0.78     3
    # 12 12  0.39     4
    # 7   7  0.49     4
    # 18 18  0.94     5
    # 33 33  0.39     5
    # 45 45 -0.69     6
    # 11 11  1.51     6
    

    0 讨论(0)
提交回复
热议问题