问题
I'm a newbie to C++ and Rcpp. Suppose, I have a vector
t1<-c(1,2,NA,NA,3,4,1,NA,5)
and I want to get a index of elements of t1 that are NA
. I can write:
NumericVector retIdxNA(NumericVector x) {
// Step 1: get the positions of NA in the vector
LogicalVector y=is_na(x);
// Step 2: count the number of NA
int Cnt=0;
for (int i=0;i<x.size();i++) {
if (y[i]) {
Cnt++;
}
}
// Step 3: create an output matrix whose size is same as that of NA
// and return the answer
NumericVector retIdx(Cnt);
int Cnt1=0;
for (int i=0;i<x.size();i++) {
if (y[i]) {
retIdx[Cnt1]=i+1;
Cnt1++;
}
}
return retIdx;
}
then I get
retIdxNA(t1)
[1] 3 4 8
I was wondering:
(i) is there any equivalent of which
in Rcpp?
(ii) is there any way to make the above function shorter/crisper? In particular, is there any easy way to combine the Step 1, 2, 3 above?
回答1:
Recent version of RcppArmadillo have functions to identify the indices of finite and non-finite values.
So this code
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::uvec whichNA(arma::vec x) {
return arma::find_nonfinite(x);
}
/*** R
t1 <- c(1,2,NA,NA,3,4,1,NA,5)
whichNA(t1)
*/
yields your desired answer (module the off-by-one in C/C++ as they are zero-based):
R> sourceCpp("/tmp/uday.cpp")
R> t1 <- c(1,2,NA,NA,3,4,1,NA,5)
R> whichNA(t1)
[,1]
[1,] 2
[2,] 3
[3,] 7
R>
Rcpp can do it too if you first create the sequence to subset into:
// [[Rcpp::export]]
Rcpp::IntegerVector which2(Rcpp::NumericVector x) {
Rcpp::IntegerVector v = Rcpp::seq(0, x.size()-1);
return v[Rcpp::is_na(x)];
}
Added to code above it yields:
R> which2(t1)
[1] 2 3 7
R>
The logical subsetting is also somewhat new in Rcpp.
回答2:
Try this:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
IntegerVector which4( NumericVector x) {
int nx = x.size();
std::vector<int> y;
y.reserve(nx);
for(int i = 0; i < nx; i++) {
if (R_IsNA(x[i])) y.push_back(i+1);
}
return wrap(y);
}
which we can run like this in R:
> which4(t1)
[1] 3 4 8
Performance
Note that we have changed the above solution to reserve space for the output vector. This replaces which3
which is:
// [[Rcpp::export]]
IntegerVector which3( NumericVector x) {
int nx = x.size();
IntegerVector y;
for(int i = 0; i < nx; i++) {
// if (internal::Rcpp_IsNA(x[i])) y.push_back(i+1);
if (R_IsNA(x[i])) y.push_back(i+1);
}
return y;
}
Then the performance on a vector 9 elements long is the following with which4
the fastest:
> library(rbenchmark)
> benchmark(retIdxNA(t1), whichNA(t1), which2(t1), which3(t1), which4(t1),
+ replications = 10000, order = "relative")[1:4]
test replications elapsed relative
5 which4(t1) 10000 0.14 1.000
4 which3(t1) 10000 0.16 1.143
1 retIdxNA(t1) 10000 0.17 1.214
2 whichNA(t1) 10000 0.17 1.214
3 which2(t1) 10000 0.25 1.786
Repeating this for a vector 9000 elements long the Armadillo solution comes in quite a bit faster than the others. Here which3
(which is the same as which4
except it does not reserve space for the output vector) comes in worst while which4
comes second.
> tt <- rep(t1, 1000)
> benchmark(retIdxNA(tt), whichNA(tt), which2(tt), which3(tt), which4(tt),
+ replications = 1000, order = "relative")[1:4]
test replications elapsed relative
2 whichNA(tt) 1000 0.09 1.000
5 which4(tt) 1000 0.79 8.778
3 which2(tt) 1000 1.03 11.444
1 retIdxNA(tt) 1000 1.19 13.222
4 which3(tt) 1000 23.58 262.000
回答3:
All of the solutions above are serial. Although not trivial, it is quite possible to take advantage of threading for implementing which
. See this write up for more details. Although for such small sizes, it would not more harm than good. Like taking a plane for a small distance, you lose too much time at airport security..
R implements which
by allocating memory for a logical vector as large as the input, does a single pass to store the indices in this memory, then copy it eventually into a proper logical vector.
Intuitively this seems less efficient than a double pass loop, but not necessarily, as copying a data range is cheap. See more details here.
回答4:
Just write a function for yourself like:
which_1<-function(a,b){
return(which(a>b))
}
Then pass this function into rcpp.
来源:https://stackoverflow.com/questions/23849354/equivalent-of-which-function-in-rcpp