Using submit_form() from rvest package returns a form which is not updated

假装没事ソ 提交于 2019-12-12 00:09:27

问题


I am trying to scrape data from a website after entering information into a form using the rvest package (version 0.3.1) in R (version 3.3.0). Below is my code:

# Load Packages
library(rvest)

# Specify URL
url <- "http://www.cocorahs.org/ViewData/ListDailyPrecipReports.aspx"
cocorahs <- html_session(url)

# Grab Initial Form
#  Form is filled in stages. Here, only do country and date
form.unfilled <- cocorahs %>% html_node("form") %>% html_form()
form.filled <- form.unfilled %>% 
  set_values("frmPrecipReportSearch:ucStateCountyFilter:ddlCountry" = "840",
             "frmPrecipReportSearch_ucDateRangeFilter_dcStartDate" = "6/15/2016",
             "frmPrecipReportSearch_ucDateRangeFilter_dcEndDate" = "6/15/2016")

submit_form(cocorahs, form.filled,
            submit="frmPrecipReportSearch:btnSearch") %>%
  html_node("form") %>% html_form()

I was expecting the result to display the updated form; while the Country updated to the USA, the date range reverts back to the default (date of access). What am I missing to ensure the form updates that particular field?


回答1:


I think you made an error in

"frmPrecipReportSearch:ucStateCountyFilter:ddlCountry" = "840"

You entered a numeric value when a country name was required.

See the code below

# Load Packages
library(rvest)

# Specify URL
url <- "http://www.cocorahs.org/ViewData/ListDailyPrecipReports.aspx"
cocorahs <- html_session(url)

# Grab Initial Form
#  Form is filled in stages. Here, only do country and date
form.unfilled <- cocorahs %>% html_node("form") %>% html_form()
form.filled <- form.unfilled %>%
set_values("frmPrecipReportSearch:ucStationTextFieldsFilter:tbTextFieldValue" = "840",
         "frmPrecipReportSearch_ucDateRangeFilter_dcStartDate" = "6/15/2016",
         "frmPrecipReportSearch_ucDateRangeFilter_dcEndDate" = "6/15/2016")

# submit the form and save as a new session
session <- submit_form(cocorahs, form.filled) 

# look for a table in the nodes
table <- session %>% html_nodes("table")

# The table you want
table[[7]] %>% html_table()


来源:https://stackoverflow.com/questions/37868709/using-submit-form-from-rvest-package-returns-a-form-which-is-not-updated

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!