问题
I can not figure out how to concatenate 2 strings with Rcpp; and the documentation did not help me while I suspect there is an obvious answer.
http://gallery.rcpp.org/articles/working-with-Rcpp-StringVector/
http://gallery.rcpp.org/articles/strings_with_rcpp/
StringVector concatenate(StringVector a, StringVector b)
{
StringVector c;
c= ??;
return c;
}
I would expect this output :
a=c("a","b"); b=c("c","d");
concatenate(a,b)
[1] "ac" "bd"
回答1:
There are probably a few different ways to approach this, but here's one option with std::transform
:
#include <Rcpp.h>
using namespace Rcpp;
struct Functor {
std::string
operator()(const std::string& lhs, const internal::string_proxy<STRSXP>& rhs) const
{
return lhs + rhs;
}
};
// [[Rcpp::export]]
CharacterVector paste2(CharacterVector lhs, CharacterVector rhs)
{
std::vector<std::string> res(lhs.begin(), lhs.end());
std::transform(
res.begin(), res.end(),
rhs.begin(), res.begin(),
Functor()
);
return wrap(res);
}
/*** R
lhs <- letters[1:2]; rhs <- letters[3:4]
paste(lhs, rhs, sep = "")
# [1] "ac" "bd"
paste2(lhs, rhs)
# [1] "ac" "bd"
*/
The reason for first copying the left hand expression into a std::vector<std::string>
is that the internal::string_proxy<>
class provides operator+ with the signature
std::string operator+(const std::string& x, const internal::string_proxy<STRSXP>& y)
rather than, e.g.
operator+(const internal::string_proxy<STRSXP>& x, const internal::string_proxy<STRSXP>& y)
If your compiler supports C++11, this can be done slightly cleaner:
// [[Rcpp::plugins(cpp11)]]
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector paste3(CharacterVector lhs, CharacterVector rhs)
{
using proxy_t = internal::string_proxy<STRSXP>;
std::vector<std::string> res(lhs.begin(), lhs.end());
std::transform(res.begin(), res.end(), rhs.begin(), res.begin(),
[&](const std::string& x, const proxy_t& y) {
return x + y;
}
);
return wrap(res);
}
/*** R
lhs <- letters[1:2]; rhs <- letters[3:4]
paste(lhs, rhs, sep = "")
# [1] "ac" "bd"
paste3(lhs, rhs)
# [1] "ac" "bd"
*/
回答2:
A working solution is to use :
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector concatenate(std::string x, std::string y)
{
return wrap(x + y);
}
Then :
Vconcatenate=Vectorize(concatenate)
Vconcatenate(letters[1:2],letters[3:4])
Or :
// [[Rcpp::export]]
CharacterVector concatenate(std::vector<std::string> x,std::vector<std::string> y)
{
std::vector<std::string> res(x.size());
for (int i=0; i < x.size(); i++)
{
res[i]=x[i]+y[i];
}
return wrap(res);
}
回答3:
I'm leaving this answer up, but note the warning provided by @nrussell regarding the use of push_back()!
I'm still getting to grips with Rcpp
myself, so I've gone for a string builder in a loop
library(Rcpp)
cppFunction('StringVector concatenate(StringVector a, StringVector b)
{
StringVector c;
std::ostringstream x;
std::ostringstream y;
// concatenate inputs
for (int i = 0; i < a.size(); i++)
x << a[i];
for (int i = 0; i < b.size(); i++)
y << b[i];
c.push_back(x.str());
c.push_back(y.str());
return c;
}')
a=c("a","b"); b=c("c","d");
concatenate(a,b)
# [1] "ab" "cd"
Comparing the performance of (i) repeated calls to push_back
against (ii) a preallocate-and-fill strategy, we can see that the latter is preferable:
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector pbpaste(CharacterVector lhs, CharacterVector rhs)
{
R_xlen_t i = 0, sz = lhs.size();
CharacterVector res;
for (std::ostringstream oss; i < sz; i++, oss.str("")) {
oss << lhs[i] << rhs[i];
res.push_back(oss.str());
}
return res;
}
// [[Rcpp::export]]
CharacterVector sspaste(CharacterVector lhs, CharacterVector rhs)
{
R_xlen_t i = 0, sz = lhs.size();
CharacterVector res(sz);
for (std::ostringstream oss; i < sz; i++, oss.str("")) {
oss << lhs[i] << rhs[i];
res[i] = oss.str();
}
return res;
}
/*** R
lhs <- as.character(1:5000); rhs <- as.character(5001:10000)
all.equal(pbpaste(lhs, rhs), sspaste(lhs, rhs))
# [1] TRUE
microbenchmark::microbenchmark(
"push_back" = pbpaste(lhs, rhs),
"preallocate" = sspaste(lhs, rhs),
times = 200L
)
# Unit: milliseconds
# expr min lq mean median uq max neval cld
# push_back 101.521579 105.334649 115.156544 107.275678 110.957420 256.722239 200 b
# preallocate 1.364213 1.585818 1.789564 1.778153 1.934758 2.955352 200 a
*/
来源:https://stackoverflow.com/questions/43182003/concatenate-stringvector-with-rcpp