Split a vector into chunks

后端 未结 20 2193
时光说笑
时光说笑 2020-11-22 01:10

I have to split a vector into n chunks of equal size in R. I couldn\'t find any base function to do that. Also Google didn\'t get me anywhere. Here is what I came up with so

20条回答
  •  眼角桃花
    2020-11-22 02:05

    This will split it differently to what you have, but is still quite a nice list structure I think:

    chunk.2 <- function(x, n, force.number.of.groups = TRUE, len = length(x), groups = trunc(len/n), overflow = len%%n) { 
      if(force.number.of.groups) {
        f1 <- as.character(sort(rep(1:n, groups)))
        f <- as.character(c(f1, rep(n, overflow)))
      } else {
        f1 <- as.character(sort(rep(1:groups, n)))
        f <- as.character(c(f1, rep("overflow", overflow)))
      }
      
      g <- split(x, f)
      
      if(force.number.of.groups) {
        g.names <- names(g)
        g.names.ordered <- as.character(sort(as.numeric(g.names)))
      } else {
        g.names <- names(g[-length(g)])
        g.names.ordered <- as.character(sort(as.numeric(g.names)))
        g.names.ordered <- c(g.names.ordered, "overflow")
      }
      
      return(g[g.names.ordered])
    }
    

    Which will give you the following, depending on how you want it formatted:

    > x <- 1:10; n <- 3
    > chunk.2(x, n, force.number.of.groups = FALSE)
    $`1`
    [1] 1 2 3
    
    $`2`
    [1] 4 5 6
    
    $`3`
    [1] 7 8 9
    
    $overflow
    [1] 10
    
    > chunk.2(x, n, force.number.of.groups = TRUE)
    $`1`
    [1] 1 2 3
    
    $`2`
    [1] 4 5 6
    
    $`3`
    [1]  7  8  9 10
    

    Running a couple of timings using these settings:

    set.seed(42)
    x <- rnorm(1:1e7)
    n <- 3
    

    Then we have the following results:

    > system.time(chunk(x, n)) # your function 
       user  system elapsed 
     29.500   0.620  30.125 
    
    > system.time(chunk.2(x, n, force.number.of.groups = TRUE))
       user  system elapsed 
      5.360   0.300   5.663 
    

    Note: Changing as.factor() to as.character() made my function twice as fast.

提交回复
热议问题