问题
I am trying to scrape a website by reading XPath code. When I go in the developer section, I see those lines:
<span class="js-bestRate-show" data-crid="11232895" data-id="928723" data-abc="0602524361510" data-referecenceta="44205406" data-catalog="1">
I would like to scrape all values for data-abc. Let's say each element on the site is a movie, so I would like to scrape all data-abc elements for each movie of the page.
I would like to do so using Rvest package with R. Below are two different attempts that did not work...
website %>% html_nodes("js-bestRate-show") %>% html_text()
website %>%
html_nodes(xpath = "js-bestRate-show") %>%
html_nodes(xpath = "//div") %>%
html_nodes(xpath = "//span") %>%
html_nodes(xpath = "//data-abc")
Anyone knows how html_nodes and Rvest work?
回答1:
The node is span
with class js-bestRate-show
. Everything else is an attribute. So you want something like:
library(rvest)
h <- '<span class="js-bestRate-show" data-crid="11232895" data-id="928723" data-abc="0602524361510" data-referecenceta="44205406" data-catalog="1">'
h %>%
read_html() %>%
html_nodes("span.js-bestRate-show") %>%
html_attr("data-abc")
来源:https://stackoverflow.com/questions/48633708/rvest-html-nodes-span-div-and-xpath