Python's xrange alternative for R OR how to loop over large dataset lazilly?

后端 未结 2 925
心在旅途
心在旅途 2020-11-28 14:41

Following example is based on discussion about using expand.grid with large data. As you can see it ends up with error. I guess this is due to possible combinat

2条回答
  •  臣服心动
    2020-11-28 15:26

    Another approach that, somehow, looks valid..:

    exp_gr = function(..., index)
    {
        args = list(...)
        ns = lengths(args)
        offs = cumprod(c(1L, ns))
        n = offs[length(offs)]
    
        stopifnot(index <= n)
    
        i = (index[[1L]] - 1L) %% offs[-1L] %/% offs[-length(offs)] 
    
        return(do.call(data.frame, 
               setNames(Map("[[", args, i + 1L), 
                        paste("Var", seq_along(args), sep = ""))))
    }
    

    In the above function, ... are the arguments to expand.grid and index is the increasing number of combinations. E.g.:

    expand.grid(1:3, 10:12, 21:24, letters[2:5])[c(5, 22, 24, 35, 51, 120, 144), ]
    #    Var1 Var2 Var3 Var4
    #5      2   11   21    b
    #22     1   11   23    b
    #24     3   11   23    b
    #35     2   12   24    b
    #51     3   11   22    c
    #120    3   10   22    e
    #144    3   12   24    e
    do.call(rbind, lapply(c(5, 22, 24, 35, 51, 120, 144), 
                          function(i) exp_gr(1:3, 10:12, 21:24, letters[2:5], index = i)))
    #  Var1 Var2 Var3 Var4
    #1    2   11   21    b
    #2    1   11   23    b
    #3    3   11   23    b
    #4    2   12   24    b
    #5    3   11   22    c
    #6    3   10   22    e
    #7    3   12   24    e
    

    And on large structures:

    expand.grid(1:1e2, 1:1e2, 1:1e2, 1:1e2, 1:1e2, 1:1e2)
    #Error in rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) : 
    #  invalid 'times' value
    #In addition: Warning message:
    #In rep.int(rep.int(seq_len(nx), rep.int(rep.fac, nx)), orep) :
    #  NAs introduced by coercion to integer range
    exp_gr(1:1e2, 1:1e2, 1:1e2, 1:1e2, 1:1e2, 1:1e2, index = 1)
    #  Var1 Var2 Var3 Var4 Var5 Var6
    #1    1    1    1    1    1    1
    exp_gr(1:1e2, 1:1e2, 1:1e2, 1:1e2, 1:1e2, 1:1e2, index = 1e3 + 487)
    #  Var1 Var2 Var3 Var4 Var5 Var6
    #1   87   15    1    1    1    1
    exp_gr(1:1e2, 1:1e2, 1:1e2, 1:1e2, 1:1e2, 1:1e2, index = 1e2 ^ 6)
    #  Var1 Var2 Var3 Var4 Var5 Var6
    #1  100  100  100  100  100  100
    exp_gr(1:1e2, 1:1e2, 1:1e2, 1:1e2, 1:1e2, 1:1e2, index = 1e11 + 154)
    #  Var1 Var2 Var3 Var4 Var5 Var6
    #1   54    2    1    1    1   11
    

    A similar approach to this would be to construct a "class" that stores the ... arguments to use expand.grid on and define a [ method to calculate the appropriate combination index when needed. Using %% and %/% seems valid, though, I guess iterating with these operators will be slower than it needs to be.

提交回复
热议问题