How to skip missing files when downloading multiples files from the web?

问题

I have a question about downloading files. I know how to download files, using the download.file function. I need to download multiple files from a particular site, each file corresponding to a different date. I have a series of dates, using which I can prepare the URL to download the file. I know for a fact that for some particular dates, the files are missing on the website. Subsequently my code stops at that point. I then have to manually reset the date index (increment it by 1) and re-run the code. Since I have to download more than 1500 files, I was wondering if I can somehow capture the 'absence of the file' and instead of the code stopping, it continues with the next date in the array.

Below is the dput of a part of the date array:

dput(head(fnames,10))
c("20060102.trd", "20060103.trd", "20060104.trd", "20060105.trd", 
"20060106.trd", "20060109.trd", "20060110.trd", "20060112.trd", 
"20060113.trd", "20060116.trd")

This file has 1723 dates. Below is the code that I am using:

for (i in 1:length(fnames)){
file <- paste(substr(fnames[i],7,8), substr(fnames[i],5,6), substr(fnames[i],1,4), sep = "")
URL <- paste("http://xxxxx_",file,".zip",sep="")
download.file(URL, paste(file, "zip", sep = "."))
unzip(paste(file, "zip", sep = "."))}

The program works fine, till it encounters a particular date for which the file is missing, and it stops. Is there a way to capture this, and print the missing file name (the variable 'file'), and move on to the next date in the array?

Please help.

I apologize that I have not shared the exact URL. In case it becomes difficult to simulate the issue, then please let me know.

* Trying to incorporate @Paul's suggestion.

I worked on a smaller dataset.

dput(testnames) is c("20120214.trd", "20120215.trd", "20120216.trd", "20120217.trd", "20120221.trd")

I know that file corresponding to the date '20120216' is missing from the website. I altered my code to incorporate the tryCatch function. Below it is:

tryCatch({for (i in 1:length(testnames)){
        file <- paste(substr(testnames[i],7,8), substr(testnames[i],5,6), substr(testnames[i],1,4), sep = "")
        URL <- paste("http://xxxx_",file,".zip",sep="")
        download.file(URL, paste(file, "zip", sep = "."))
        unzip(paste(file, "zip", sep = "."))}
},
error = function(e) {cat(file, '\n')
                     i=i+1},
warning = function(w) {message('cannot unzip')
                       i=i+1}
)

It runs fine for the first two dates, and as expected, throws an error for the 3rd one. I am facing 2 issues:

When I 'exclude' the warning block, it gives me the missing file name file as coded in the error block. But when I 'include' the warning block, it only issues the warning, and somehow doesnt execute the error block. Why is that?
In either case, the code stops after reading "20120216.trd" and doesnt proceed ahead with the next file, which is desirable. Is incrementing the variable i not sufficient for that purpose?

Please advise.

回答1:

You can do this using tryCatch. This function will try the operation you feed it, and provide you with a way to dealing with errors. For example, in your case an error could simply lead to skipping the file and ignoring the error. For example:

skip_with_message = simpleError('Did not work out')
tryCatch(print(bla), error = function(e) skip_with_message)
# <simpleError: Did not work out>

Notice that the error here is that the bla object does not exist.

来源：https://stackoverflow.com/questions/21137071/how-to-skip-missing-files-when-downloading-multiples-files-from-the-web

标签

try-catch