Speed up the loop operation in R

前端 未结 10 2242
说谎
说谎 2020-11-22 00:04

I have a big performance problem in R. I wrote a function that iterates over a data.frame object. It simply adds a new column to a data.frame and a

10条回答
  •  深忆病人
    2020-11-22 00:30

    As Ari mentioned at the end of his answer, the Rcpp and inline packages make it incredibly easy to make things fast. As an example, try this inline code (warning: not tested):

    body <- 'Rcpp::NumericMatrix nm(temp);
             int nrtemp = Rccp::as(nrt);
             for (int i = 0; i < nrtemp; ++i) {
                 temp(i, 9) = i
                 if (i > 1) {
                     if ((temp(i, 5) == temp(i - 1, 5) && temp(i, 2) == temp(i - 1, 2) {
                         temp(i, 9) = temp(i, 8) + temp(i - 1, 9)
                     } else {
                         temp(i, 9) = temp(i, 8)
                     }
                 } else {
                     temp(i, 9) = temp(i, 8)
                 }
             return Rcpp::wrap(nm);
            '
    
    settings <- getPlugin("Rcpp")
    # settings$env$PKG_CXXFLAGS <- paste("-I", getwd(), sep="") if you want to inc files in wd
    dayloop <- cxxfunction(signature(nrt="numeric", temp="numeric"), body-body,
        plugin="Rcpp", settings=settings, cppargs="-I/usr/include")
    
    dayloop2 <- function(temp) {
        # extract a numeric matrix from temp, put it in tmp
        nc <- ncol(temp)
        nm <- dayloop(nc, temp)
        names(temp)[names(temp) == "V10"] <- "Kumm."
        return(temp)
    }
    

    There's a similar procedure for #includeing things, where you just pass a parameter

    inc <- '#include 
    

    to cxxfunction, as include=inc. What's really cool about this is that it does all of the linking and compilation for you, so prototyping is really fast.

    Disclaimer: I'm not totally sure that the class of tmp should be numeric and not numeric matrix or something else. But I'm mostly sure.

    Edit: if you still need more speed after this, OpenMP is a parallelization facility good for C++. I haven't tried using it from inline, but it should work. The idea would be to, in the case of n cores, have loop iteration k be carried out by k % n. A suitable introduction is found in Matloff's The Art of R Programming, available here, in chapter 16, Resorting to C.

提交回复
热议问题