harvesting data via drop down list in R

只愿长相守 提交于 2019-12-10 11:57:27

问题


I am trying to harvest data from this website

http://www.lkcr.cz/seznam-lekaru-426.html (it's in Czech)

I need to go through every possible combination of "Okres"(region) and "Obor"(specialization)

I tried rvest, but it does not seem to find that there is any dropdown list, html_form returns list of length 0.

therefore, as I am still a newbie in R, how can I "ask" the webpage to show me new combination of pages?

thank you

JH


回答1:


I'd use the following:

library(rvest)
library(dplyr)
library(tidyr)

pg <- read_html("http://www.lkcr.cz/seznam-lekaru-426.html")

obor <- html_nodes(pg, "select[name='filterObor'] > option")
obor_df <- data_frame(
  value=xml_attr(obor, "value"),
  option=xml_text(obor)
)

glimpse(obor_df)
## Observations: 115
## Variables: 2
## $ value  <chr> "", "16", "107", "17", "1", "19", "20", "21", "22", "29...
## $ option <chr> "", "alergologie a klinická imunologie", "algeziologie"...
okres <- html_nodes(pg, "select[name='filterOkresId'] > option")
okres_df <- data_frame(
  value=xml_attr(okres, "value"),
  option=xml_text(okres)
)

glimpse(okres_df)
## Observations: 78
## Variables: 2
## $ value  <chr> "", "3201", "3202", "3701", "3702", "3703", "3801", "37...
## $ option <chr> "", "Benešov", "Beroun", "Blansko", "Brno-město", "Brno...

in case field order ever changes (plus it's good to get familiar with targeting nodes with CSS selectors and XPath selectors).

You still need to iterate over each pair (you can do that with nested purrr::map calls; I personally prbly wldn't use expand.grid or tidyr::complete for this).

BUT…

You're going to have issues submitting the form with rvest since the site uses javacript to do some data processing before submitting.

You should use Chrome and open up Developer Tools to see what actually gets submitted field-wise and prbly switch to using httr::POST. If you have trouble with that, you should open up a new question on SO.



来源:https://stackoverflow.com/questions/40259717/harvesting-data-via-drop-down-list-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!