Passing by reference a data.frame and updating it with rcpp

放肆的年华 提交于 2019-11-29 09:24:49

问题


looking at the rcpp documentation and Rcpp::DataFrame in the gallery I realized that I didn't know how to modify a DataFrame by reference. Googling a bit I found this post on SO and this post on the archive. There is nothing obvious so I suspect I miss something big like "It is already the case because" or "it does not make sense because".

I tried the following which compiled but the data.frame object passed to updateDFByRef in R stayed untouched

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::export]]
void updateDFByRef(DataFrame& df) {
    int N = df.nrows();
    NumericVector newCol(N,1.);
    df["newCol"] = newCol;
    return;
}

回答1:


The way DataFrame::operator[] is implemented indeed leeds to a copy when you do that:

df["newCol"] = newCol;

To do what you want, you need to consider what a data frame is, a list of vectors, with certain attributes. Then you can grab data from the original, by copying the vectors (the pointers, not their content).

Something like this does it. It is a little more work, but not that hard.

// [[Rcpp::export]]
List updateDFByRef(DataFrame& df, std::string name) {
    int nr = df.nrows(), nc= df.size() ;
    NumericVector newCol(nr,1.);
    List out(nc+1) ;
    CharacterVector onames = df.attr("names") ;
    CharacterVector names( nc + 1 ) ;
    for( int i=0; i<nc; i++) {
        out[i] = df[i] ;
        names[i] = onames[i] ;
    }
    out[nc] = newCol ;
    names[nc] = name ;
    out.attr("class") = df.attr("class") ;
    out.attr("row.names") = df.attr("row.names") ;
    out.attr("names") = names ;
    return out ;
}

There are issues associated with this approach. Your original data frame and the one you created share the same vectors and so bad things can happen. So only use this if you know what you are doing.




回答2:


The short answers is "because it makes no sense".

A data.frame is essentially a list of vectors. A few seconds of reflection makes it clear that adding a new column to that list entails a copy. So you alter your variable df in the example, do not return it and hence loose the modification.

Merely wishing for something to work a certain way is not always enough.



来源:https://stackoverflow.com/questions/15731106/passing-by-reference-a-data-frame-and-updating-it-with-rcpp

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!