问题
New member here. Trying to download a large number of files from a website in R (but open to suggestions as well, such as wget.)
From this post, I understand I must create a vector with the desired URLs. My initial problem is to write this vector, since I have 27 states and 34 agencies within each state. I must download one file for each agency for all states. Whereas the state codes are always two characters, the agency codes are 2 to 7 characters long. The URLs would look like this:
http://website.gov/xx_yyyyyyy.zip
where xxis the state code and yyyyyyy the agency code, between 2 and 7 characters long. I am lost as to how to build one such loop.
I assume I can then download this url list with the following function:
for(i in 1:length(url)){
download.file(urls, destinations, mode="wb")}
Does that make sense?
(Disclaimer: an earlier version of this post was uploaded earlier but incomplete. My mistake, sorry!)
回答1:
This will download them in batches and take advantage of the speedier simultaneous downloading capabilities of download.file() if the libcurl option is available on your installation of R:
library(purrr)
states <- state.abb[1:27]
agencies <- c("AID", "AMBC", "AMTRAK", "APHIS", "ATF", "BBG", "DOJ", "DOT",
"BIA", "BLM", "BOP", "CBFO", "CBP", "CCR", "CEQ", "CFTC", "CIA",
"CIS", "CMS", "CNS", "CO", "CPSC", "CRIM", "CRT", "CSB", "CSOSA",
"DA", "DEA", "DHS", "DIA", "DNFSB", "DOC", "DOD", "DOE", "DOI")
walk(states, function(x) {
map(x, ~sprintf("http://website.gov/%s_%s.zip", ., agencies)) %>%
flatten_chr() -> urls
download.file(urls, basename(urls), method="libcurl")
})
回答2:
This should do the job:
agency <- c("FAA", "DEA", "NTSB")
states <- c("AL", "AK", "AZ", "AR")
URLs <-
paste0("http://website.gov/",
rep(agency, length(agency)),
"_",
rep(states, length(states)),
".zip")
Then loop through the URLs vector to pull the zip files. It will be faster if you use an apply function.
回答3:
If all your agency codes are the same within each state code you could use the below to create your vector of urls to loop through. (You will also need a vector of destinations the same size).
#Getting all combinations
States <- c("AA","BB")
Agency <- c("ABCDEFG","HIJKLMN")
AllCombinations <- expand.grid(States, Agency)
AllCombinationsVec <- paste0("http://website.gov/" ,AllCombinations$Var1, "_",AllCombinations$Var2,".zip" )
You can then try looping through each file something like this:
#loop method
for(i in seq(AllCombinationsVec)){
download.file(AllCombinationsVec[i], destinations[i], mode="wb")}
This is also another way of looping through items apply functions will apply a function to every item in a list or vector.
#lapply method
mapply(function(x, y) download.file(x,y, mode="wb"),x = AllCombinationsVec, y = destinations)
来源:https://stackoverflow.com/questions/41185735/downloading-multiple-files-in-r-with-variable-length-nested-urls