问题
I am trying to scrape data from a website after entering information into a form using the rvest
package (version 0.3.1) in R (version 3.3.0). Below is my code:
# Load Packages
library(rvest)
# Specify URL
url <- "http://www.cocorahs.org/ViewData/ListDailyPrecipReports.aspx"
cocorahs <- html_session(url)
# Grab Initial Form
# Form is filled in stages. Here, only do country and date
form.unfilled <- cocorahs %>% html_node("form") %>% html_form()
form.filled <- form.unfilled %>%
set_values("frmPrecipReportSearch:ucStateCountyFilter:ddlCountry" = "840",
"frmPrecipReportSearch_ucDateRangeFilter_dcStartDate" = "6/15/2016",
"frmPrecipReportSearch_ucDateRangeFilter_dcEndDate" = "6/15/2016")
submit_form(cocorahs, form.filled,
submit="frmPrecipReportSearch:btnSearch") %>%
html_node("form") %>% html_form()
I was expecting the result to display the updated form; while the Country updated to the USA, the date range reverts back to the default (date of access). What am I missing to ensure the form updates that particular field?
回答1:
I think you made an error in
"frmPrecipReportSearch:ucStateCountyFilter:ddlCountry" = "840"
You entered a numeric value when a country name was required.
See the code below
# Load Packages
library(rvest)
# Specify URL
url <- "http://www.cocorahs.org/ViewData/ListDailyPrecipReports.aspx"
cocorahs <- html_session(url)
# Grab Initial Form
# Form is filled in stages. Here, only do country and date
form.unfilled <- cocorahs %>% html_node("form") %>% html_form()
form.filled <- form.unfilled %>%
set_values("frmPrecipReportSearch:ucStationTextFieldsFilter:tbTextFieldValue" = "840",
"frmPrecipReportSearch_ucDateRangeFilter_dcStartDate" = "6/15/2016",
"frmPrecipReportSearch_ucDateRangeFilter_dcEndDate" = "6/15/2016")
# submit the form and save as a new session
session <- submit_form(cocorahs, form.filled)
# look for a table in the nodes
table <- session %>% html_nodes("table")
# The table you want
table[[7]] %>% html_table()
来源:https://stackoverflow.com/questions/37868709/using-submit-form-from-rvest-package-returns-a-form-which-is-not-updated