equivalent of 'which' function in Rcpp

走远了吗. 提交于 2019-12-10 21:59:51

问题


I'm a newbie to C++ and Rcpp. Suppose, I have a vector

t1<-c(1,2,NA,NA,3,4,1,NA,5)

and I want to get a index of elements of t1 that are NA. I can write:

NumericVector retIdxNA(NumericVector x) {

    // Step 1: get the positions of NA in the vector
    LogicalVector y=is_na(x);

    // Step 2: count the number of NA
    int Cnt=0;
    for (int i=0;i<x.size();i++) {
       if (y[i]) {
         Cnt++;
       }
    }

    // Step 3: create an output matrix whose size is same as that of NA
    // and return the answer
    NumericVector retIdx(Cnt);
    int Cnt1=0;
    for (int i=0;i<x.size();i++) {
       if (y[i]) {
          retIdx[Cnt1]=i+1;
          Cnt1++;
       }
    }
    return retIdx;
}

then I get

retIdxNA(t1)
[1] 3 4 8

I was wondering:

(i) is there any equivalent of which in Rcpp?

(ii) is there any way to make the above function shorter/crisper? In particular, is there any easy way to combine the Step 1, 2, 3 above?


回答1:


Recent version of RcppArmadillo have functions to identify the indices of finite and non-finite values.

So this code

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::uvec whichNA(arma::vec x) {
  return arma::find_nonfinite(x);
}

/*** R
t1 <- c(1,2,NA,NA,3,4,1,NA,5)
whichNA(t1)
*/

yields your desired answer (module the off-by-one in C/C++ as they are zero-based):

R> sourceCpp("/tmp/uday.cpp")

R> t1 <- c(1,2,NA,NA,3,4,1,NA,5)

R> whichNA(t1)
     [,1]
[1,]    2
[2,]    3
[3,]    7
R> 

Rcpp can do it too if you first create the sequence to subset into:

// [[Rcpp::export]]
Rcpp::IntegerVector which2(Rcpp::NumericVector x) {
  Rcpp::IntegerVector v = Rcpp::seq(0, x.size()-1);
  return v[Rcpp::is_na(x)];
}

Added to code above it yields:

R> which2(t1)
[1] 2 3 7
R> 

The logical subsetting is also somewhat new in Rcpp.




回答2:


Try this:

#include <Rcpp.h> 
using namespace Rcpp; 

// [[Rcpp::export]]
IntegerVector which4( NumericVector x) {

    int nx = x.size();
    std::vector<int> y;
    y.reserve(nx);

    for(int i = 0; i < nx; i++) {
        if (R_IsNA(x[i])) y.push_back(i+1);
    }

    return wrap(y);
}

which we can run like this in R:

> which4(t1)
[1] 3 4 8

Performance

Note that we have changed the above solution to reserve space for the output vector. This replaces which3 which is:

// [[Rcpp::export]]
IntegerVector which3( NumericVector x) {
    int nx = x.size();
    IntegerVector y;
    for(int i = 0; i < nx; i++) {
        // if (internal::Rcpp_IsNA(x[i])) y.push_back(i+1);
        if (R_IsNA(x[i])) y.push_back(i+1);
    }
    return y;
}

Then the performance on a vector 9 elements long is the following with which4 the fastest:

> library(rbenchmark)
> benchmark(retIdxNA(t1), whichNA(t1), which2(t1), which3(t1), which4(t1), 
+    replications = 10000, order = "relative")[1:4]
          test replications elapsed relative
5   which4(t1)        10000    0.14    1.000
4   which3(t1)        10000    0.16    1.143
1 retIdxNA(t1)        10000    0.17    1.214
2  whichNA(t1)        10000    0.17    1.214
3   which2(t1)        10000    0.25    1.786

Repeating this for a vector 9000 elements long the Armadillo solution comes in quite a bit faster than the others. Here which3 (which is the same as which4 except it does not reserve space for the output vector) comes in worst while which4 comes second.

> tt <- rep(t1, 1000)
> benchmark(retIdxNA(tt), whichNA(tt), which2(tt), which3(tt), which4(tt), 
+   replications = 1000, order = "relative")[1:4]
          test replications elapsed relative
2  whichNA(tt)         1000    0.09    1.000
5   which4(tt)         1000    0.79    8.778
3   which2(tt)         1000    1.03   11.444
1 retIdxNA(tt)         1000    1.19   13.222
4   which3(tt)         1000   23.58  262.000



回答3:


All of the solutions above are serial. Although not trivial, it is quite possible to take advantage of threading for implementing which. See this write up for more details. Although for such small sizes, it would not more harm than good. Like taking a plane for a small distance, you lose too much time at airport security..

R implements which by allocating memory for a logical vector as large as the input, does a single pass to store the indices in this memory, then copy it eventually into a proper logical vector.

Intuitively this seems less efficient than a double pass loop, but not necessarily, as copying a data range is cheap. See more details here.




回答4:


Just write a function for yourself like:

which_1<-function(a,b){
return(which(a>b))
}

Then pass this function into rcpp.



来源:https://stackoverflow.com/questions/23849354/equivalent-of-which-function-in-rcpp

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!