constructing a Data Frame in Rcpp

空扰寡人 提交于 2019-12-03 00:11:44

It seems Rcpp can return a proper data.frame, provided you supply the names explicitely. I'm not sure how to adapt this to your example with arbitrary names

mkdf <- '
    Rcpp::DataFrame dfin(input);
    Rcpp::DataFrame dfout;
    for (int i=0;i<dfin.length();i++) {
        dfout.push_back(dfin(i));
    }

    return Rcpp::DataFrame::create( Named("x")= dfout(1), Named("y") = dfout(2));
'
library(inline)
test <- cxxfunction( signature(input="data.frame"),
                              mkdf, plugin="Rcpp")

test(input=head(iris))
Dirk Eddelbuettel

Briefly:

  • DataFrames are indeed just like lists with the added restriction of having to have a common length, so they are best constructed column by column.

  • The best way is often to look for our unit tests. Her inst/unitTests/runit.DataFrame.R regroups tests for the DataFrame class.

  • You also found the .push_back() member function in Rcpp which we added for convenience and analogy with the STL. We do warn that it is not recommended: due to differences with the way R objects are constructed, we essentially always need to do full copies .push_back is not very efficient.

  • Despite me answering here frequently, the rcpp-devel list a better place for Rcpp questions.

highBandWidth

Using the information from @baptiste's answer, this is what finally does give a well formed data frame:

RcppExport SEXP makeDataFrame(SEXP in) {
    Rcpp::DataFrame dfin(in);
    Rcpp::DataFrame dfout;
    Rcpp::CharacterVector namevec;
    std::string namestem = "Column Heading ";
    for (int i=0;i<2;i++) {
        dfout.push_back(dfin(i));
        namevec.push_back(namestem+std::string(1,(char)(((int)'a') + i)));
    }
    dfout.attr("names") = namevec;
    Rcpp::DataFrame x;
    Rcpp::Language call("as.data.frame",dfout);
    x = call.eval();
    return x;
}

I think the point remains that this might be inefficient due to push_back (as suggested by @Dirk) and the second Language call evaluation. I looked up the rcpp unitTests, and haven't been able to come up with something better yet. Anybody have any ideas?

Update:

Using @Dirk's suggestions (thanks!), this seems to be a simpler, efficient solution:

RcppExport SEXP makeDataFrame(SEXP in) {
    Rcpp::DataFrame dfin(in);
    Rcpp::List myList(dfin.length());
    Rcpp::CharacterVector namevec;
    std::string namestem = "Column Heading ";
    for (int i=0;i<dfin.length();i++) {
        myList[i] = dfin(i); // adding vectors
        namevec.push_back(namestem+std::string(1,(char)(((int)'a') + i))); // making up column names
    }
    myList.attr("names") = namevec;
    Rcpp::DataFrame dfout(myList);
    return dfout;
}

I concur with joran. The output of a C function called from within R is a list of all its arguments, both "in" and "out", so each "column" of the dataframe could be represented in the C function call as an argument. Once the result of the C function call is in R, all that remains to be done is to extract those list elements using list indexing and give them the appropriate names.

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!