Permute a vector such that an element can't be in the same place

隐身守侯 提交于 2021-02-04 14:36:19

问题


I want to permute a vector so that an element can't be in the same place after permutation, as it was in the original. Let's say I have a list of elements like this: AABBCCADEF

A valid shuffle would be: BBAADEFCCA

But these would be invalid: BAACFEDCAB or BCABFEDCAB

The closest answer I could find was this: python shuffle such that position will never repeat. But that's not quite what I want, because there are no repeated elements in that example.

I want a fast algorithm that generalizes that answer in the case of repetitions.

MWE:

library(microbenchmark)

set.seed(1)
x <- sample(letters, size=295, replace=T)

terrible_implementation <- function(x) {
  xnew <- sample(x)
  while(any(x == xnew)) {
    xnew <- sample(x)
  }
  return(xnew)
}

microbenchmark(terrible_implementation(x), times=10)


Unit: milliseconds
                       expr      min       lq    mean  median       uq      max neval
 terrible_implementation(x) 479.5338 2346.002 4738.49 2993.29 4858.254 17005.05    10

Also, how do I determine if a sequence can be permuted in such a way?

EDIT: To make it perfectly clear what I want, the new vector should satisfy the following conditions:

1) all(table(newx) == table(x)) 2) all(x != newx)

E.g.:

newx <- terrible_implementation(x)
all(table(newx) == table(x))
[1] TRUE
all(x != newx)
[1] TRUE

回答1:


I think this satisfies all your conditions. The idea is to order by the frequency, start with the most common element and shift the value to the next value in the frequency table by the number of times the most common element appears. This will guarantee all elements will be missed.

I've written in data.table, as it helped me during debugging, without losing too much performance. It's a modest improvement performance-wise.

library(data.table)
library(magrittr)
library(microbenchmark)


permute_avoid_same_position <- function(y) {
  DT <- data.table(orig = y)
  DT[, orig_order := .I]

  count_by_letter <- 
    DT[, .N, keyby = orig] %>%
    .[order(N)] %>%
    .[, stable_order := .I] %>%
    .[order(-stable_order)] %>%
    .[]

  out <- copy(DT)[count_by_letter, .(orig, orig_order, N), on = "orig"]
  # Dummy element
  out[, new := first(y)]
  origs <- out[["orig"]]
  nrow_out <- nrow(out)
  maxN <- count_by_letter[["N"]][1]

  out[seq_len(nrow_out) > maxN, new := head(origs, nrow_out - maxN)]
  out[seq_len(nrow_out) <= maxN, new := tail(origs, maxN)]

  DT[out, j = .(orig_order, orig, new), on = "orig_order"] %>%
    .[order(orig_order)] %>%
    .[["new"]]
}

set.seed(1)
x <- sample(letters, size=295, replace=T)
testthat::expect_true(all(table(permute_avoid_same_position(x)) == table(x)))
testthat::expect_true(all(x != permute_avoid_same_position(x)))
microbenchmark(permute_avoid_same_position(x), times = 5)

# Unit: milliseconds
#                           expr      min       lq     mean   median       uq      max
# permute_avoid_same_position(x) 5.650378 5.771753 5.875116 5.788618 5.938604 6.226228

x <- sample(1:1000, replace = TRUE, size = 1e6)
testthat::expect_true(all(table(permute_avoid_same_position(x)) == table(x)))
testthat::expect_true(all(x != permute_avoid_same_position(x)))

microbenchmark(permute_avoid_same_position(x), times = 5)
# Unit: milliseconds
#                           expr      min       lq    mean   median       uq      max
# permute_avoid_same_position(x) 239.7744 385.4686 401.521 438.2999 440.9746 503.0875



回答2:


#DATA
set.seed(1)
x <- sample(letters, size=295, replace=T)

foo = function(S){
    if(max(table(S)) > length(S)/2){
        stop("NOT POSSIBLE")
    }
    U = unique(S)
    done_chrs = character(0)
    inds = integer(0)
    ans = character(0)
    while(!identical(sort(done_chrs), sort(U))){
        my_chrs = U[!U %in% done_chrs]
        next_chr = my_chrs[which.min(sapply(my_chrs, function(x) length(setdiff(which(!S %in% x), inds))))]
        x_inds = which(S %in% next_chr)
        candidates = setdiff(seq_along(S), union(x_inds, inds))
        if (length(candidates) == 1){
            new_inds = candidates
        }else{
            new_inds = sample(candidates, length(x_inds))
        }
        inds = c(inds, new_inds)
        ans[new_inds] = next_chr
        done_chrs = c(done_chrs, next_chr)
    }
    return(ans)
}

ans_foo = foo(x)

identical(sort(ans_foo), sort(x)) & !any(ans_foo == x)
#[1] TRUE

library(microbenchmark)
microbenchmark(foo(x))
#Unit: milliseconds
#   expr      min       lq     mean   median       uq      max neval
# foo(x) 19.49833 22.32517 25.65675 24.85059 27.96838 48.61194   100



回答3:


We could extract substrings by the boundary of the repeating elements, sample and replicate

library(stringr)
sapply(replicate(10, sample(str_extract_all(str1, "([[:alpha:]])\\1*")[[1]]),
                simplify = FALSE), paste, collapse="")
#[1] "BBAAEFDCCA" "AAAFBBEDCC" "BBAAAEFCCD" "DFACCBBAAE" "AAFCCBBEAD" 
#[6] "DAAAECCBBF" "AAFCCDBBEA" "CCEFADBBAA" "BBAAEADCCF" "AACCBBDFAE"

data

str1 <- "AABBCCADEF"


来源:https://stackoverflow.com/questions/47191948/permute-a-vector-such-that-an-element-cant-be-in-the-same-place

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!