问题
I'm working writing some R extensions on C (C functions to be called from R).
My code needs to compute a statistic using 2 different datasets at the same time, and I need to perform this with all possible pair combinations. Then, I need all these statistics (very large arrays) to continue the calculation on the C side. Those files are very large, typically ~40GB, and that's my problem.
To do this on C called by R, first I need to load all the datasets in R to pass them then to the C function call. But, ideally, it is possible to maintain only 2 of those files on memory at the same time, following the sequence if I were able to access the datasets from C or Fortran directly:
open file1 - open file2 - compute cov(1,2)
close file2
hold file1 - open file3 - compute cov(1,3)
... // same approach
This is fine on R because I can load/unload files, but when calling C or Fortran I haven't any mechanism to load/unload files. So, my question is, can I read .Rdata files from Fortran or C directly, being able to open/close them? Any other approaches to the problem?
As far as I've read, the answer is no. So, I'm considering to move from Rdata to HDF5.
回答1:
It is not too hard to call R functions from C, using the .Call
interface. So write an R function that inputs the data, and invoke that from C. When you're done with one file, UNPROTECT() the data you've read in. This is illustrated in the following
## function that reads my data in from a single file
fun <- function(fl)
readLines(fl)
library(inline) ## party trick -- compile C code from within R
doit <- cfunction(signature(fun="CLOSXP", filename="STRSXP", env="ENVSXP"), '
SEXP lng = PROTECT(lang2(fun, filename)); // create R language expression
SEXP ans = PROTECT(eval(lng, env)); // evaluate the expression
// do things with the ans, e.g., ...
int len = length(ans);
UNPROTECT(2); // release for garbage collection
return ScalarInteger(len); // return something
')
doit(fun, "call.R", environment())
A simpler approach is to invert the problem -- read two data files in, then call C with the data.
来源:https://stackoverflow.com/questions/26981755/it-is-possible-to-read-rdata-file-format-from-c-or-fortran