How to pass multiple values in a rvest submission form

♀尐吖头ヾ 提交于 2019-12-12 05:43:26

问题


This is a follow up to a prior thread. The code works fantastic for a single value but I get the following error when trying to pass more than 1 value I get an error based on the length of the function. Error in vapply(elements, encode, character(1)) : values must be length 1, but FUN(X[1]) result is length 3

Here is a sample of the code. In most instances I have been able just to name an object and scrape that way.

library(httr)
library(rvest)
library(dplyr)

b<-c('48127','48180','49504')

POST(
 url = "http://www.nearestoutlet.com/cgi-bin/smi/findsmi.pl", 
 body = list(zipcode = b), 
 encode = "form"
) -> res

I was wondering if a loop to insert the values into the form would be the right way to go? However my loop writing skills are still in development and I am unsure of where to place it; in addition when i call the loop it doesn't print line by line it just returns null results.

#d isn't listed in the above code as it returns null    
d<-for(i in 1:3){nrow(b)}

回答1:


Here is an approach to send multiple POST requests

library(httr)
library(rvest)
b <- c('48127','48180','49504')

For each element in b perform a function that will send the appropriate POST request

res <- lapply(b, function(x){
  res <- POST(
    url = "http://www.nearestoutlet.com/cgi-bin/smi/findsmi.pl", 
    body = list(zipcode = x), 
    encode = "form"
  ) 
  res <- read_html(content(res, as="raw")) 
})

Now for each element of the list res you should do the parsing steps explained by hrbrmstr: How can I Scrape a CGI-Bin with rvest and R?

library(tidyverse)

I will use hrbrmstr's code since he is king and it is already clear to you. Only thing we are doing here is performing it on each element of res list.

res_list = lapply(res, function(x){
    rows <- html_nodes(x, "table[width='300'] > tr > td")
    ret <- data_frame(
    record = !is.na(html_attr(rows, "bgcolor")),
    text = html_text(rows, trim=TRUE)
    ) %>% 
    mutate(record = cumsum(record)) %>% 
    filter(text != "") %>% 
    group_by(record) %>% 
    summarise(x = paste0(text, collapse="|")) %>% 
    separate(x, c("store", "address1", "city_state_zip", "phone_and_or_distance"), sep="\\|", extra="merge")
  return(ret)
}
)

or using map from purrr

res %>%
  map(function(x){
    rows <- html_nodes(x, "table[width='300'] > tr > td")
    data_frame(
      record = !is.na(html_attr(rows, "bgcolor")),
      text = html_text(rows, trim=TRUE)
      ) %>% 
      mutate(record = cumsum(record)) %>% 
      filter(text != "") %>% 
      group_by(record) %>% 
      summarise(x = paste0(text, collapse="|")) %>% 
      separate(x, c("store", "address1", "city_state_zip", "phone_and_or_distance"),
               sep="\\|", extra="merge") -> ret
    return(ret)
  }
  )

If you would like this in a data frame:

res_df <- data.frame(do.call(rbind, res_list), #rbinds list elements 
                     b = rep(b, times = unlist(lapply(res_list, length)))) #names the rows according to elements in b



回答2:


You can put the values inside the post as below,

 b<-c('48127','48180','49504')

    for(i in 1:length(b)) {

    POST(
     url = "http://www.nearestoutlet.com/cgi-bin/smi/findsmi.pl", 
     body = list(zipcode =b[i]), 
     encode = "form"
    ) -> res

    # YOUR CODES HERE (for getting content of the page etc.)

    }

But since for every different zipcode value the "res" value will be different, you need the put the rest of the codes inside the area I commented. Otherwise you get the last value only.



来源:https://stackoverflow.com/questions/46759058/how-to-pass-multiple-values-in-a-rvest-submission-form

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!