Subset an atomic vector in-place

问题

Continuing from Subsetting a large vector uses unnecessarily large amounts of memory :

Given an atomic vector, for example

x <- rep_len(1:10, 1e7)

How can I modify x in-place to remove elements by numeric index using Rcpp? In R, one can do this, but not in-place (i.e. without duplicating x):

idrops <- c(5, 4, 9)
x <- x[-idrops]

A reasonably efficient way to do this would be the following:

IntegerVector dropElements(IntegerVector x, IntegerVector inds) {
  R_xlen_t n = x.length();
  R_xlen_t ndrops = inds.length();
  IntegerVector out = no_init(n - ndrops);
  R_xlen_t k = 0; // index of out
  for (R_xlen_t i = 0; i < n; ++i) {
    bool drop = false;
    for (R_xlen_t j = 0; j < ndrops; ++j) {
      if (i == inds[j]) {
        drop = true;
        break;
      }
    }
    if (drop) {
      continue;
    }
    out[k] = x[i];
    ++k;
  }
  return out;
}

though this is hardly in-place (it's also not very safe but that's beside the point). I'm aware of STL's .erase(), though it appears that Rcpp by design makes a copy before converting to STL.

回答1:

The question you linked to was a bit simpler and a one-liner in Rcpp, but you can implement efficient negative indexing by looping over your negative index vector and subsetting ranges of the data. E.g.:

#include <Rcpp.h>
using namespace Rcpp;

// solution for the original question
// [[Rcpp::export]]
IntegerVector popBeginningOfVector(IntegerVector x, int npop) {
  return IntegerVector(x.begin() + npop, x.end());
}

// [[Rcpp::export]]
IntegerVector efficientNegativeIndexing(IntegerVector x, IntegerVector neg_idx) {
  std::sort(neg_idx.begin(), neg_idx.end());
  size_t ni_size = neg_idx.size();
  size_t xsize = x.size();
  int * xptr = INTEGER(x);
  int * niptr = INTEGER(neg_idx);
  size_t xtposition = 0;
  IntegerVector xt(xsize - ni_size); // allocate new vector of the correct size
  int * xtptr = INTEGER(xt);
  int range_begin, range_end;
  for(size_t i=0; i < ni_size; ++i) {
    if(i == 0) {
      range_begin = 0;
    } else {
      range_begin = neg_idx[i-1];
    }
    range_end = neg_idx[i] - 1;
    // std::cout << range_begin << " " << range_end << std::endl;
    std::copy(xptr+range_begin, xptr+range_end, xtptr+xtposition);
    xtposition += range_end - range_begin;
  }
  std::copy(xptr+range_end+1, xptr + xsize, xtptr+xtposition);
  return xt;
}

Usage:

library(Rcpp)
sourceCpp("~/Desktop/temp.cpp")

x <- rep_len(1:10, 1e7)
idrops <- c(5, 4, 9)
outputR <- x[-idrops]
outputRcpp <- efficientNegativeIndexing(x, idrops)
identical(outputRcpp, outputR)

library(microbenchmark)
microbenchmark(efficientNegativeIndexing(x, idrops), x[-idrops], times=10)

来源：https://stackoverflow.com/questions/57359221/subset-an-atomic-vector-in-place

标签

subset

rcpp