Concatenate StringVector with Rcpp

旧时模样 提交于 2019-12-07 12:23:13

问题


I can not figure out how to concatenate 2 strings with Rcpp; and the documentation did not help me while I suspect there is an obvious answer.

http://gallery.rcpp.org/articles/working-with-Rcpp-StringVector/

http://gallery.rcpp.org/articles/strings_with_rcpp/

StringVector concatenate(StringVector a, StringVector b)
{
 StringVector c;
 c= ??;
 return c;
}

I would expect this output :

a=c("a","b"); b=c("c","d");
concatenate(a,b)
[1] "ac" "bd"

回答1:


There are probably a few different ways to approach this, but here's one option with std::transform:

#include <Rcpp.h>
using namespace Rcpp;

struct Functor {
    std::string
    operator()(const std::string& lhs, const internal::string_proxy<STRSXP>& rhs) const
    {
        return lhs + rhs;
    }
};

// [[Rcpp::export]]
CharacterVector paste2(CharacterVector lhs, CharacterVector rhs)
{
    std::vector<std::string> res(lhs.begin(), lhs.end());
    std::transform(
        res.begin(), res.end(),
        rhs.begin(), res.begin(),
        Functor()
    );
    return wrap(res);
}

/*** R

lhs <- letters[1:2]; rhs <- letters[3:4]

paste(lhs, rhs, sep = "")
# [1] "ac" "bd"

paste2(lhs, rhs)
# [1] "ac" "bd"

*/ 

The reason for first copying the left hand expression into a std::vector<std::string> is that the internal::string_proxy<> class provides operator+ with the signature

std::string operator+(const std::string& x, const internal::string_proxy<STRSXP>& y) 

rather than, e.g.

operator+(const internal::string_proxy<STRSXP>& x, const internal::string_proxy<STRSXP>& y) 

If your compiler supports C++11, this can be done slightly cleaner:

// [[Rcpp::plugins(cpp11)]]
#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
CharacterVector paste3(CharacterVector lhs, CharacterVector rhs)
{
    using proxy_t = internal::string_proxy<STRSXP>;

    std::vector<std::string> res(lhs.begin(), lhs.end());
    std::transform(res.begin(), res.end(), rhs.begin(), res.begin(),
        [&](const std::string& x, const proxy_t& y) {
            return x + y;
        }
    );

    return wrap(res);
}

/*** R

lhs <- letters[1:2]; rhs <- letters[3:4]

paste(lhs, rhs, sep = "")
# [1] "ac" "bd"

paste3(lhs, rhs)
# [1] "ac" "bd"

*/



回答2:


A working solution is to use :

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
CharacterVector concatenate(std::string x, std::string y)
{
               return wrap(x + y);
}

Then :

Vconcatenate=Vectorize(concatenate)
Vconcatenate(letters[1:2],letters[3:4])

Or :

// [[Rcpp::export]]
CharacterVector concatenate(std::vector<std::string> x,std::vector<std::string> y)
{
  std::vector<std::string> res(x.size());
  for (int i=0; i < x.size(); i++)
  {
    res[i]=x[i]+y[i];
  }
  return wrap(res);
}



回答3:


I'm leaving this answer up, but note the warning provided by @nrussell regarding the use of push_back()!


I'm still getting to grips with Rcpp myself, so I've gone for a string builder in a loop

library(Rcpp)

cppFunction('StringVector concatenate(StringVector a, StringVector b)
{
  StringVector c;
  std::ostringstream x;
  std::ostringstream y;

 // concatenate inputs
  for (int i = 0; i < a.size(); i++)
    x << a[i];

  for (int i = 0; i < b.size(); i++)
    y << b[i];

  c.push_back(x.str());
  c.push_back(y.str());

  return c;

}')

a=c("a","b"); b=c("c","d");
concatenate(a,b)
# [1] "ab" "cd" 

Comparing the performance of (i) repeated calls to push_back against (ii) a preallocate-and-fill strategy, we can see that the latter is preferable:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
CharacterVector pbpaste(CharacterVector lhs, CharacterVector rhs)
{
    R_xlen_t i = 0, sz = lhs.size();
    CharacterVector res;

    for (std::ostringstream oss; i < sz; i++, oss.str("")) {
        oss << lhs[i] << rhs[i];
        res.push_back(oss.str());
    }

    return res;
}

// [[Rcpp::export]]
CharacterVector sspaste(CharacterVector lhs, CharacterVector rhs)
{
    R_xlen_t i = 0, sz = lhs.size();
    CharacterVector res(sz);

    for (std::ostringstream oss; i < sz; i++, oss.str("")) {
        oss << lhs[i] << rhs[i];
        res[i] = oss.str();
    }

    return res;
}

/*** R

lhs <- as.character(1:5000); rhs <- as.character(5001:10000)

all.equal(pbpaste(lhs, rhs), sspaste(lhs, rhs))
# [1] TRUE

microbenchmark::microbenchmark(
    "push_back" = pbpaste(lhs, rhs),
    "preallocate" = sspaste(lhs, rhs),
    times = 200L
)
# Unit: milliseconds
#         expr        min         lq       mean     median         uq        max neval cld
#    push_back 101.521579 105.334649 115.156544 107.275678 110.957420 256.722239   200   b
#  preallocate   1.364213   1.585818   1.789564   1.778153   1.934758   2.955352   200   a

*/


来源:https://stackoverflow.com/questions/43182003/concatenate-stringvector-with-rcpp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!