Goal:
- Import the newest file (.csv) from a local directory into R
Goal Details:
- A csv file is uploaded to a folder dail
The following function uses a timestamp file to "keep track" of files that have been processed with the use of a timestamp file. It can be run either continually in an R instance (as you first suggested), or by way of single-run instances, lending to @andrew's suggestion of a cron job. (The cat()
command is included primarily for testing; feel free to remove it.)
processDir <- function(directory = '.', pattern = '*.csv', loop = FALSE, delay = 600,
stampFile = file.path(directory, '.csvProcessor')) {
if (! file.exists(stampFile))
file.create(stampFile)
firstRun <- TRUE
while (firstRun || loop) {
firstRun <- FALSE
stampTime <- file.info(stampFile)$mtime
allFilesDF <- file.info(list.files(path = directory, pattern = pattern,
full.names = TRUE, no.. = TRUE))
unprocessedFiles <- allFilesDF[(! allFilesDF$isdir) &
(allFilesDF$mtime > stampTime), ]
if (nrow(unprocessedFiles)) {
## We need to update the timestamp on stampFile quickly so
## that files added while this is running will be found in the
## next loop.
## WARNING: this blindly truncates the stampFile.
file.create(stampFile, showWarnings = FALSE)
for (fn in rownames(unprocessedFiles)) {
cat('Processing ', fn, '\n')
## read.csv(fn)
## ...
}
}
if (loop) Sys.sleep(delay)
}
}
As you initially suggested, running it in a continually-running R instance would simply be:
processDir(loop = TRUE)
To use @andrew's suggestion of a cron job, append the following line after the function definition:
processDir()
... and use a crontab file similar to the following:
# crontab
0 8 * * * path/to/Rscript path/to/processDir.R
Hope this helps.