What I would like to achieve is get a 10% sample from each group (which is a combination of 2 factors - recency and frequency category). So far I have thought about package sampling and function strata(). Which looks promising but I am getting the following error and it is really hard to understand the error message and what is wrong or how to get around this.
Here is my code:
> d[1:10,] date id_email_op recency frequecy r_cat f_cat 1 29.8.2011 19393 294 1 A G 2 29.8.2011 19394 230 4 A D 3 29.8.2011 19395 238 12 A B 4 29.8.2011 19396 294 1 A G 5 29.8.2011 19397 223 9 A C 6 29.8.2011 19398 185 7 A C 7 29.8.2011 19399 273 2 A F 8 29.8.2011 19400 16 4 C D 9 29.8.2011 19401 294 1 A G 10 29.8.2011 19402 3 5 F C > table(d$f_cat,d$r_cat) A B C D E F A 176 203 289 228 335 983 B 1044 966 1072 633 742 1398 C 6623 3606 3020 1339 1534 2509 D 4316 1790 1239 529 586 880 E 8431 2798 2005 767 817 1151 F 22140 5432 3937 1415 1361 1868 G 100373 18316 11872 3760 3453 4778 > as.vector(table(d$f_cat,d$r_cat)) [1] 176 1044 6623 4316 8431 22140 100373 203 966 3606 1790 2798 5432 [14] 18316 289 1072 3020 1239 2005 3937 11872 228 633 1339 529 767 [27] 1415 3760 335 742 1534 586 817 1361 3453 983 1398 2509 880 [40] 1151 1868 4778 > s <- strata(d,c("f_cat","r_cat"),size=as.vector(ceiling(0.1 * table(d$f_cat,d$r_cat))), method="srswor") Error in strata(d, c("f_cat", "r_cat"), size = as.vector(table(d$f_cat, : not enough obervations for the stratum 6
I cant really see what is stratum 6. What is the condition the function checks in background? I am not sure I that I have the size param set up correctly. And yes I have checked the documentation of sampling package :)
Thanks everyone and