I have to download multiple xlsx files about a country's census data from internet using R. Files are located in this Link .The problems are:
- I am unable to write a loop which will let me go back and forth to download
- File being download has some weird name not districts name. So how can I change it to districts name dynamically.
I have used the below mentioned codes:
url<-"http://www.censusindia.gov.in/2011census/HLO/HL_PCA/HH_PCA1/HLPCA-28532-2011_H14_census.xlsx"
download.file(url, "HLPCA-28532-2011_H14_census.xlsx", mode="wb")
But this downloads one file at a time and doesnt change the file name.
Thanks in advance.
Assuming you want all the data without knowing all of the urls, your questing involves webparsing. Package httr provides useful function for retrieving HTML-code of a given website, which you can parse for links.
Maybe this bit of code is what you're looking for:
library(httr)
base_url = "http://www.censusindia.gov.in/2011census/HLO/" # main website
r <- GET(paste0(base_url, "HL_PCA/Houselisting-housing-HLPCA.html"))
rc = content(r, "text")
rcl = unlist(strsplit(rc, "<a href =\\\"")) # find links
rcl = rcl[grepl("Houselisting-housing-.+?\\.html", rcl)] # find links to houslistings
names = gsub("^.+?>(.+?)</.+$", "\\1",rcl) # get names
names = gsub("^\\s+|\\s+$", "", names) # trim names
links = gsub("^(Houselisting-housing-.+?\\.html).+$", "\\1",rcl) # get links
# iterate over regions
for(i in 1:length(links)) {
url_hh = paste0(base_url, "HL_PCA/", links[i])
if(!url_success(url_hh)) next
r <- GET(url_hh)
rc = content(r, "text")
rcl = unlist(strsplit(rc, "<a href =\\\"")) # find links
rcl = rcl[grepl(".xlsx", rcl)] # find links to houslistings
hh_names = gsub("^.+?>(.+?)</.+$", "\\1",rcl) # get names
hh_names = gsub("^\\s+|\\s+$", "", hh_names) # trim names
hh_links = gsub("^(.+?\\.xlsx).+$", "\\1",rcl) # get links
# iterate over subregions
for(j in 1:length(hh_links)) {
url_xlsx = paste0(base_url, "HL_PCA/",hh_links[j])
if(!url_success(url_xlsx)) next
filename = paste0(names[i], "_", hh_names[j], ".xlsx")
download.file(url_xlsx, filename, mode="wb")
}
}
来源:https://stackoverflow.com/questions/32241713/how-to-download-multiple-files-using-loop-in-r