stratified sampling with group size below sample size in R

风格不统一 提交于 2019-12-13 04:26:21

问题


I have response data by market in the format:

head(df)
    ID  market  q1  q2
    470 France  1   3
    625 Germany 0   2
    155 Italy   1   6
    648 Spain   0   5
    862 France  1   7
    699 Germany 0   8
    460 Italy   1   6
    333 Spain   1   5
    776 Spain   1   4

and the following frequencies:

 table(df$market)
    France  140
    Germany 300
    Italy   50
    Spain   75

I need to create a data frame with a sample of 100 responses per market, and all responses without replacement in cases when there's less than 100 of them.

so

table(df_new$market)
        France  100
        Germany 100
        Italy   50
        Spain   75

Thanks in advance!


回答1:


The following looks valid:

set.seed(10); DF = data.frame(c1 = sample(LETTERS[1:4], 25, T), c2 = runif(25))
freqs = as.data.frame(table(DF$c1))
freqs$ss = ifelse(freqs$Freq >= 5, 5, freqs$Freq)
#> freqs
#  Var1 Freq ss
#1    A    4  4
#2    B   11  5
#3    C    7  5
#4    D    3  3
res = mapply(function(x, y) DF[sample(which(DF$c1 %in% x), y), ], 
             x = freqs$Var1, y = freqs$ss, SIMPLIFY = F)
do.call(rbind, res)
#   c1        c2
#5   A 0.3558977
#17  A 0.2289039
#6   A 0.5355970
#13  A 0.9546536
#3   B 0.2395891
#25  B 0.8015470
#10  B 0.4226376
#15  B 0.5005032
#19  B 0.7289646
#11  C 0.7477465
#9   C 0.8998325
#12  C 0.8226526
#1   C 0.7066469
#4   C 0.7707715
#23  D 0.4861003
#20  D 0.2498805
#21  D 0.1611833


来源:https://stackoverflow.com/questions/22819432/stratified-sampling-with-group-size-below-sample-size-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!