问题
Continuing from Subsetting a large vector uses unnecessarily large amounts of memory :
Given an atomic vector, for example
x <- rep_len(1:10, 1e7)
How can I modify x
in-place to remove elements by numeric index using Rcpp? In R, one can do this, but not in-place (i.e. without duplicating x
):
idrops <- c(5, 4, 9)
x <- x[-idrops]
A reasonably efficient way to do this would be the following:
IntegerVector dropElements(IntegerVector x, IntegerVector inds) {
R_xlen_t n = x.length();
R_xlen_t ndrops = inds.length();
IntegerVector out = no_init(n - ndrops);
R_xlen_t k = 0; // index of out
for (R_xlen_t i = 0; i < n; ++i) {
bool drop = false;
for (R_xlen_t j = 0; j < ndrops; ++j) {
if (i == inds[j]) {
drop = true;
break;
}
}
if (drop) {
continue;
}
out[k] = x[i];
++k;
}
return out;
}
though this is hardly in-place (it's also not very safe but that's beside the point). I'm aware of STL's .erase()
, though it appears that Rcpp by design makes a copy before converting to STL.
回答1:
The question you linked to was a bit simpler and a one-liner in Rcpp, but you can implement efficient negative indexing by looping over your negative index vector and subsetting ranges of the data. E.g.:
#include <Rcpp.h>
using namespace Rcpp;
// solution for the original question
// [[Rcpp::export]]
IntegerVector popBeginningOfVector(IntegerVector x, int npop) {
return IntegerVector(x.begin() + npop, x.end());
}
// [[Rcpp::export]]
IntegerVector efficientNegativeIndexing(IntegerVector x, IntegerVector neg_idx) {
std::sort(neg_idx.begin(), neg_idx.end());
size_t ni_size = neg_idx.size();
size_t xsize = x.size();
int * xptr = INTEGER(x);
int * niptr = INTEGER(neg_idx);
size_t xtposition = 0;
IntegerVector xt(xsize - ni_size); // allocate new vector of the correct size
int * xtptr = INTEGER(xt);
int range_begin, range_end;
for(size_t i=0; i < ni_size; ++i) {
if(i == 0) {
range_begin = 0;
} else {
range_begin = neg_idx[i-1];
}
range_end = neg_idx[i] - 1;
// std::cout << range_begin << " " << range_end << std::endl;
std::copy(xptr+range_begin, xptr+range_end, xtptr+xtposition);
xtposition += range_end - range_begin;
}
std::copy(xptr+range_end+1, xptr + xsize, xtptr+xtposition);
return xt;
}
Usage:
library(Rcpp)
sourceCpp("~/Desktop/temp.cpp")
x <- rep_len(1:10, 1e7)
idrops <- c(5, 4, 9)
outputR <- x[-idrops]
outputRcpp <- efficientNegativeIndexing(x, idrops)
identical(outputRcpp, outputR)
library(microbenchmark)
microbenchmark(efficientNegativeIndexing(x, idrops), x[-idrops], times=10)
来源:https://stackoverflow.com/questions/57359221/subset-an-atomic-vector-in-place