Rcpp function to select (and to return) a sub-dataframe

后端 未结 3 1752
北海茫月
北海茫月 2020-12-03 12:56

Is it possible to write a C++ function that gets an R dataFrame as input, then modifies the dataFrame (in our case taking a subset) and returns the new data frame (in this q

3条回答
  •  隐瞒了意图╮
    2020-12-03 13:17

    Here is a complete test file. It does not need your extractor function and just re-assembles the subsets -- but for that it needs the very newest Rcpp as currently on GitHub where Kevin happens to have added some work on subset indexing which is just what we need here:

    #include 
    
    /*** R
    ##  Suppose I have the data frame below created in R:
    ##  NB: stringsAsFactors set to FALSE
    ##  NB: setting seed as well
    set.seed(42)
    myDF <- data.frame(id = rep(c(1,2), each = 5), 
                       alph = letters[1:10], 
                       mess = rnorm(10), 
                       stringsAsFactor=FALSE)
    */
    
    // [[Rcpp::export]]
    Rcpp::DataFrame extract(Rcpp::DataFrame D, Rcpp::IntegerVector idx) {
    
      Rcpp::IntegerVector     id = D["id"];
      Rcpp::CharacterVector alph = D["alph"];
      Rcpp::NumericVector   mess = D["mess"];
    
      return Rcpp::DataFrame::create(Rcpp::Named("id")    = id[idx],
                                     Rcpp::Named("alpha") = alph[idx],
                                     Rcpp::Named("mess")  = mess[idx]);
    }
    
    /*** R
    extract(myDF, c(2,4,6,8))
    */
    

    With that file, we get the expected result:

    R> library(Rcpp)
    R> sourceCpp("/tmp/sepher.cpp")
    
    R> ##  Suppose I have the data frame below created in R:
    R> ##  NB: stringsAsFactors set to FALSE
    R> ##  NB: setting seed as well
    R> set.seed(42)
    
    R> myDF <- data.frame(id = rep(c(1,2), each = 5), 
    +                    alph = letters[1:10], 
    +                    mess = rnorm(10), 
    +               .... [TRUNCATED] 
    
    R> extract(myDF, c(2,4,6,8))
      id alpha     mess
    1  1     c 0.363128
    2  1     e 0.404268
    3  2     g 1.511522
    4  2     i 2.018424
    R>
    R> packageDescription("Rcpp")$Version   ## unreleased version
    [1] "0.11.1.1"
    R> 
    

    I just needed something similar a few weeks ago (but not involving character vectors) and used Armadillo with its elem() functions using an unsigned int vector as index.

提交回复
热议问题