Permuting elements of a vector 10,000 times - efficiently? (R)

☆樱花仙子☆ 提交于 2019-12-06 16:15:57

I don't think that the computations should be as expensive as you are making them to be. For small "x" vectors, you might want to overshoot a little bit (here, I've sort of overdone it), then check for duplicates using duplicated. If the difference between the number required and the number of duplicated rows is too much for you to get your desired 10,000, repeat the process to fill the difference, using rbind to add the ones you want to keep to the matrix you get from replicate. This could be implemented in a while loop.

x <- c("A", "B", "B", "E", "C", "C", "D", "E", "A", "C")
set.seed(1)
N <- t(replicate(15000, sample(x)))
sum(duplicated(N))
# [1] 1389
out <- N[!(duplicated(N)), ][1:10000, ]
head(out)
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# [1,] "B"  "E"  "C"  "D"  "B"  "E"  "A"  "C"  "C"  "A"  
# [2,] "B"  "B"  "C"  "C"  "C"  "D"  "E"  "E"  "A"  "A"  
# [3,] "C"  "B"  "C"  "A"  "A"  "E"  "D"  "C"  "B"  "E"  
# [4,] "C"  "C"  "E"  "B"  "C"  "E"  "A"  "A"  "D"  "B"  
# [5,] "A"  "C"  "D"  "E"  "E"  "C"  "A"  "B"  "B"  "C"  
# [6,] "C"  "E"  "E"  "B"  "A"  "C"  "D"  "A"  "B"  "C"

The duplicated step is actually the most expensive, as far as I can see:

y <- sample(500, 1000, TRUE)
system.time(N <- t(replicate(12000, sample(y))))
# user  system elapsed 
# 2.35    0.08    2.43 
system.time(D <- sum(duplicated(N)))
#  user  system elapsed 
# 14.82    0.01   14.84 
D
# [1] 0

^^ There, we have no duplicates in our 12,000 samples.

In case you are only interested in the first 10000 permutations (in dictionary order), you can make use of the iterpc library.

library(iterpc)
x <- c("A", "B", "B", "E", "C", "C", "D", "E", "A", "C")
I <- iterpc(table(x), ordered=TRUE)
# first 10000 permutations
result <- getnext(I, d=10000)

And it is very fast to get them.

> system.time(getnext(I, d=10000))
   user  system elapsed 
  0.004   0.000   0.005 

Here's an idea. This is not necessarily an answer but it's too big for a comment.

Get the permutations in an orderly way, and add them to a collection. For example, if elements are A, B, C, and D:

A B C D
A B D C
A D B C
... so on

And once you have got required number of permutations (10000 in your case), permute that collection once.

If the cost of randomization is the bottleneck, this approach should solve it.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!