问题
This is a follow up to a prior thread. The code works fantastic for a single value but I get the following error when trying to pass more than 1 value I get an error based on the length of the function. Error in vapply(elements, encode, character(1)) : values must be length 1, but FUN(X[1]) result is length 3
Here is a sample of the code. In most instances I have been able just to name an object and scrape that way.
library(httr)
library(rvest)
library(dplyr)
b<-c('48127','48180','49504')
POST(
url = "http://www.nearestoutlet.com/cgi-bin/smi/findsmi.pl",
body = list(zipcode = b),
encode = "form"
) -> res
I was wondering if a loop to insert the values into the form would be the right way to go? However my loop writing skills are still in development and I am unsure of where to place it; in addition when i call the loop it doesn't print line by line it just returns null results.
#d isn't listed in the above code as it returns null
d<-for(i in 1:3){nrow(b)}
回答1:
Here is an approach to send multiple POST requests
library(httr)
library(rvest)
b <- c('48127','48180','49504')
For each element in b perform a function that will send the appropriate POST request
res <- lapply(b, function(x){
res <- POST(
url = "http://www.nearestoutlet.com/cgi-bin/smi/findsmi.pl",
body = list(zipcode = x),
encode = "form"
)
res <- read_html(content(res, as="raw"))
})
Now for each element of the list res
you should do the parsing steps explained by hrbrmstr: How can I Scrape a CGI-Bin with rvest and R?
library(tidyverse)
I will use hrbrmstr's code since he is king and it is already clear to you. Only thing we are doing here is performing it on each element of res
list.
res_list = lapply(res, function(x){
rows <- html_nodes(x, "table[width='300'] > tr > td")
ret <- data_frame(
record = !is.na(html_attr(rows, "bgcolor")),
text = html_text(rows, trim=TRUE)
) %>%
mutate(record = cumsum(record)) %>%
filter(text != "") %>%
group_by(record) %>%
summarise(x = paste0(text, collapse="|")) %>%
separate(x, c("store", "address1", "city_state_zip", "phone_and_or_distance"), sep="\\|", extra="merge")
return(ret)
}
)
or using map
from purrr
res %>%
map(function(x){
rows <- html_nodes(x, "table[width='300'] > tr > td")
data_frame(
record = !is.na(html_attr(rows, "bgcolor")),
text = html_text(rows, trim=TRUE)
) %>%
mutate(record = cumsum(record)) %>%
filter(text != "") %>%
group_by(record) %>%
summarise(x = paste0(text, collapse="|")) %>%
separate(x, c("store", "address1", "city_state_zip", "phone_and_or_distance"),
sep="\\|", extra="merge") -> ret
return(ret)
}
)
If you would like this in a data frame:
res_df <- data.frame(do.call(rbind, res_list), #rbinds list elements
b = rep(b, times = unlist(lapply(res_list, length)))) #names the rows according to elements in b
回答2:
You can put the values inside the post as below,
b<-c('48127','48180','49504')
for(i in 1:length(b)) {
POST(
url = "http://www.nearestoutlet.com/cgi-bin/smi/findsmi.pl",
body = list(zipcode =b[i]),
encode = "form"
) -> res
# YOUR CODES HERE (for getting content of the page etc.)
}
But since for every different zipcode value the "res" value will be different, you need the put the rest of the codes inside the area I commented. Otherwise you get the last value only.
来源:https://stackoverflow.com/questions/46759058/how-to-pass-multiple-values-in-a-rvest-submission-form