问题
I am trying to retrieve files from 3 network drives using list.files and it takes for ever. When I am using find in the shell it returns all results in less then 15 seconds.
system.time(
jnk <- list.files(c("/Volumes/massspec", "/Volumes/massspec2", "/Volumes/massspec3"),
pattern='_MA_.*_HeLa_',
recursive=TRUE))
# user system elapsed
# 1.567 6.381 309.500
Here is the equivalent shell command.
time find /Volumes/masssp* -name *_MA_*_HeLa_*
# real 0m13.776s
# user 0m0.361s
# sys 0m0.620s
I need a solution which works on Windows and Unix systems. Has anyone a good idea? The network drives have altogether about 120,000 files but about 16TB. So not much files but very huge ones.
回答1:
Based on the comment, I wrote a little R function which should work on Windows and Unix...
quickFileSearch <- function(path, pattern) {
switch (.Platform$OS.type,
unix={
paths <- paste(path, collapse=' ')
command <- paste('find', paths, '-name', pattern)
system(command, intern=TRUE)
},
windows={
paths <- paste(file.path(path, pattern,
fsep='\\'),
collapse=' ')
command <- paste('dir', paths, '/b /s /a-d')
shell(command, intern=TRUE)}
)
}
The whole thing is not much tested yet but it is working for my purpose.
来源:https://stackoverflow.com/questions/39743174/performance-problems-with-list-files