Speed up the loop operation in R

前端未结

关注

 10  2242

说谎 2020-11-22 00:04

I have a big performance problem in R. I wrote a function that iterates over a data.frame object. It simply adds a new column to a data.frame and a

10条回答

深忆病人 (楼主)

2020-11-22 00:30

As Ari mentioned at the end of his answer, the Rcpp and inline packages make it incredibly easy to make things fast. As an example, try this inline code (warning: not tested):

body <- 'Rcpp::NumericMatrix nm(temp);
         int nrtemp = Rccp::as(nrt);
         for (int i = 0; i < nrtemp; ++i) {
             temp(i, 9) = i
             if (i > 1) {
                 if ((temp(i, 5) == temp(i - 1, 5) && temp(i, 2) == temp(i - 1, 2) {
                     temp(i, 9) = temp(i, 8) + temp(i - 1, 9)
                 } else {
                     temp(i, 9) = temp(i, 8)
                 }
             } else {
                 temp(i, 9) = temp(i, 8)
             }
         return Rcpp::wrap(nm);
        '

settings <- getPlugin("Rcpp")
# settings$env$PKG_CXXFLAGS <- paste("-I", getwd(), sep="") if you want to inc files in wd
dayloop <- cxxfunction(signature(nrt="numeric", temp="numeric"), body-body,
    plugin="Rcpp", settings=settings, cppargs="-I/usr/include")

dayloop2 <- function(temp) {
    # extract a numeric matrix from temp, put it in tmp
    nc <- ncol(temp)
    nm <- dayloop(nc, temp)
    names(temp)[names(temp) == "V10"] <- "Kumm."
    return(temp)
}

There's a similar procedure for #includeing things, where you just pass a parameter

inc <- '#include

to cxxfunction, as include=inc. What's really cool about this is that it does all of the linking and compilation for you, so prototyping is really fast.

Disclaimer: I'm not totally sure that the class of tmp should be numeric and not numeric matrix or something else. But I'm mostly sure.

Edit: if you still need more speed after this, OpenMP is a parallelization facility good for C++. I haven't tried using it from inline, but it should work. The idea would be to, in the case of n cores, have loop iteration k be carried out by k % n. A suitable introduction is found in Matloff's The Art of R Programming, available here, in chapter 16, Resorting to C.

0 讨论(0)

查看其它10个回答