Stratified sampling - not enough observations

匿名 (未验证) 提交于 2019-12-03 02:28:01

问题:

What I would like to achieve is get a 10% sample from each group (which is a combination of 2 factors - recency and frequency category). So far I have thought about package sampling and function strata(). Which looks promising but I am getting the following error and it is really hard to understand the error message and what is wrong or how to get around this.

Here is my code:

> d[1:10,]         date id_email_op recency frequecy r_cat f_cat 1  29.8.2011       19393     294        1     A     G 2  29.8.2011       19394     230        4     A     D 3  29.8.2011       19395     238       12     A     B 4  29.8.2011       19396     294        1     A     G 5  29.8.2011       19397     223        9     A     C 6  29.8.2011       19398     185        7     A     C 7  29.8.2011       19399     273        2     A     F 8  29.8.2011       19400      16        4     C     D 9  29.8.2011       19401     294        1     A     G 10 29.8.2011       19402       3        5     F     C > table(d$f_cat,d$r_cat)           A      B      C      D      E      F   A    176    203    289    228    335    983   B   1044    966   1072    633    742   1398   C   6623   3606   3020   1339   1534   2509   D   4316   1790   1239    529    586    880   E   8431   2798   2005    767    817   1151   F  22140   5432   3937   1415   1361   1868   G 100373  18316  11872   3760   3453   4778 > as.vector(table(d$f_cat,d$r_cat))  [1]    176   1044   6623   4316   8431  22140 100373    203    966   3606   1790   2798   5432 [14]  18316    289   1072   3020   1239   2005   3937  11872    228    633   1339    529    767 [27]   1415   3760    335    742   1534    586    817   1361   3453    983   1398   2509    880 [40]   1151   1868   4778 > s <- strata(d,c("f_cat","r_cat"),size=as.vector(ceiling(0.1 * table(d$f_cat,d$r_cat))), method="srswor") Error in strata(d, c("f_cat", "r_cat"), size = as.vector(table(d$f_cat,  :    not enough obervations for the stratum 6 

I cant really see what is stratum 6. What is the condition the function checks in background? I am not sure I that I have the size param set up correctly. And yes I have checked the documentation of sampling package :)

Thanks everyone and

回答1:

You could always do it yourself:

stratified <- NULL for(x in 1:6) {   tmp1 <- sample(rownames(subset(d, r_cat == "A" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "A")*0.1))   tmp2 <- sample(rownames(subset(d, r_cat == "B" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "B")*0.1))   tmp3 <- sample(rownames(subset(d, r_cat == "C" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "C")*0.1))   tmp4 <- sample(rownames(subset(d, r_cat == "D" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "D")*0.1))   tmp5 <- sample(rownames(subset(d, r_cat == "E" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "E")*0.1))   tmp6 <- sample(rownames(subset(d, r_cat == "F" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "F")*0.1))   tmp7 <- sample(rownames(subset(d, r_cat == "G" & f_cat == LETTERS[x])),round(nrow(d[r_cat == "G")*0.1))   stratified <- c(stratified,tmp1,tmp2,tmp3,tmp4,tmp5,tmp6,tmp7) } 

And then...

d[stratified,] would be your stratified sample.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!