Slice a string at consecutive indices with R / Rcpp?

随声附和 提交于 2019-11-30 21:31:29

I would use substring. Something like this:

strslice <- function( x, n ){   
    starts <- seq( 1L, nchar(x), by = n )
    substring( x, starts, starts + n-1L )
}
strslice( "abcdef", 2 )
# [1] "ab" "cd" "ef"

About your Rcpp code, maybe you can allocate the std::vector<std::string> with the right size, so that you avoid resizing it which might mean memory allocations, ... or perhaps directly use a Rcpp::CharacterVector. Something like this:

strslice_rcpp <- rcpp( signature(x="character", n="integer"), '
    std::string myString = as<std::string>(x);
    int cutpoint = as<int>(n);
    int len = myString.length();
    int nout = len / cutpoint ;
    CharacterVector out( nout ) ;
    for( int i=0; i<nout; i++ ) {
      out[i] = myString.substr( cutpoint*i, 2 ) ;
    }
    return out ;
')
strslice_rcpp( "abdcefg", 2 )
# [1] "ab" "cd" "ef"

This one-liner using strapplyc from the gsubfn package is fast enough that rcpp may not be needed. Here we apply it to the entire text of James Joyce's Ulysses which only takes a few seconds:

library(gsubfn)
joyce <- readLines("http://www.gutenberg.org/files/4300/4300-8.txt") 
joycec <- paste(joyce, collapse = " ") # all in one string 
n <- 2
system.time(s <- strapplyc(joycec, paste(rep(".", n), collapse = ""))[[1]])
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!