Using a sample list as a template for sampling from a larger list without wraparound

问题

If I have a vector of letters:

> all <- letters
> all
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"

and then I define a reference sample from letters as follows:

> refSample <- c("j","l","m","s")

in which the spacing between elements is 2 (1st to 2nd), 1 (2nd to 3rd) and 6 (3rd to 4th), how can I then select n samples from all that have identical, non-wrap-around spacing between its elements to refSample? For example, "a","c","d","j" and "q" "s" "t" "z" would be valid samples, but "a","c","d","k" and "r" "t" "u" "a" would not. The former has an index difference of 7 (rather than 6) between the 3rd and last element, whereas the latter has the correct spacing but wraps around.

Second, how can I parameterise this, so that whatever refSample is used, I can use the spacing of that as a template?

回答1:

Here's a simple way --

all <- letters                                                                                                                                                                                                                                                                
refSample <- c("j","l","m","s")                                                                                                                                                                                                                                               


pick_matches <- function(n, ref, full) {                                                                                                                                                                                                                                      
  iref <- match(ref,full)                                                                                                                                                                                                                                                     
  spaces <- diff(iref)                                                                                                                                                                                                                                                        
  tot_space <- sum(spaces)                                                                                                                                                                                                                                                    
  max_start <- length(full)  - tot_space                                                                                                                                                                                                                                      
  starts <- sample(1:max_start, n, replace = TRUE)                                                                                                                                                                                                                            
  return( sapply( starts, function(s) full[ cumsum(c(s, spaces)) ] ) )                                                                                                                                                                                                        
}                                                                                                                                                                                                                                                                             

> set.seed(1)                                                                                                                                                                                                                                                                
> pick_matches(5, refSample, all) # each COLUMN is a desired sample vector                                                                                                                                                                                                                                         
      [,1] [,2] [,3] [,4] [,5]                                                                                                                                                                                                                                                
 [1,] "e"  "g"  "j"  "p"  "d"                                                                                                                                                                                                                                                 
 [2,] "g"  "i"  "l"  "r"  "f"                                                                                                                                                                                                                                                 
 [3,] "h"  "j"  "m"  "s"  "g"                                                                                                                                                                                                                                                 
 [4,] "n"  "p"  "s"  "y"  "m"

来源：https://stackoverflow.com/questions/10438705/using-a-sample-list-as-a-template-for-sampling-from-a-larger-list-without-wrapar

标签

sampling

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!