Make R package easy to update with new files from users

问题

First let me explain that I come from the Python world, where I can do what I want like this in the shell:

$ export PYTHONPATH=~/myroot
$ mkdir -p ~/myroot/mypkg
$ touch ~/myroot/mypkg/__init__.py # this is the one bit of "magic" for Python
$ echo 'hello = "world"' > ~/myroot/mypkg/mymodule.py

Then in Python:

>>> import mypkg.mymodule
>>> mypkg.mymodule.hello
'world'

What I did there was to create a package which is easily extended by other users. I can check in ~/myroot/mypkg to source control and other users can later add modules to it using just a text editor. Now I want to do the equivalent thing in R. Here's what I have so far:

$ export R_LIBS=~/myR # already this is bad: it makes install.packages() put things here!
$ mkdir -p ~/myR
$ echo 'hello = "world"' > /tmp/mycode.R

Now in R:

> package.skeleton(name="mypkg", code_files="/tmp/mycode.R")

Now back to the shell:

$ R CMD build mypkg
$ R CMD INSTALL mypkg

Now back to R:

> library(mypkg)
> hello
"world"

So that works. But now how do my colleagues add new modules to this package? I want them to be able to just go in and add a new R file, but it seems like we then have to redo the entire package building process, which is tedious. And it seems like we will then end up checking in many generated files. But most importantly, where did the code go? Once I did CMD INSTALL, R knew what my code was, but it did not put the literal text (from mycode.R) anywhere under $R_LIBS (did it "compile" the code? I'm not sure).

Previously we would just source() our "modules" but this is not very good because it reloads the code every time, so indirect (transitive) dependencies end up reloading stuff that is already loaded.

My question is, how do people manage simple, in-house, collaboratively edited, source-controlled, non-binary, non-compiled, shared code in R?

I'm using R 3.1.1 on Linux. If the solution works on Windows too that would be nice.

回答1:

It seems R has nothing like Python's import statement, so I made my own. Just put this in a file like import.r and source it via your $R_PROFILE.

# this is sort of like Python's import statement, and lets us avoid redundant sourcing

.imports <- c("import") # module names imported so far, to avoid redundant imports (never import ourselves)

.importScriptPath <- function() {
  # returns the path of the executing script
  # see http://stackoverflow.com/questions/1815606

  # this will only work if the caller was loaded with source()
  filePath <- sys.frame(2)$ofile

  # if the caller was not loaded with source(), use the main script path
  if (length(filePath) == 0) {
    argv <- commandArgs(trailingOnly = FALSE)
    filePath <- substring(argv[grep("--file=", argv)], 8)
  }

  return (dirname(filePath))
}

import <- function(module) {
  # locates the given module (character or token), calls source() on it, and does nothing on subsequent calls

  module <- as.character(substitute(module)) # support import(foo) not only import("foo")

  if (module %in% .imports) {
    return(invisible())
  }

  moduleFilename <- paste0(gsub("\\.", "/", module), ".r") # allow import(foo.bar) as import("foo/bar")
  importPaths <- c(.importScriptPath()) # add more search paths here as desired

  for (importPath in importPaths) {
    modulePath <- file.path(importPath, moduleFilename)
    if (file.exists(modulePath)) {
      source(modulePath)
      .imports <<- append(.imports, module) # <<- updates the global variable so we skip it next time
      return(invisible())
    }
  }

  # last chance: try to load module as a standard library
  suppressPackageStartupMessages(library(module, character.only = TRUE))
  .imports <<- append(.imports, module)
  return(invisible())
}

来源：https://stackoverflow.com/questions/25050275/make-r-package-easy-to-update-with-new-files-from-users

标签

package

dependency-management