Rvest html_nodes span div and Xpath

痞子三分冷 提交于 2019-12-24 21:24:31

问题


I am trying to scrape a website by reading XPath code. When I go in the developer section, I see those lines:

<span class="js-bestRate-show" data-crid="11232895" data-id="928723" data-abc="0602524361510" data-referecenceta="44205406" data-catalog="1">

I would like to scrape all values for data-abc. Let's say each element on the site is a movie, so I would like to scrape all data-abc elements for each movie of the page.

I would like to do so using Rvest package with R. Below are two different attempts that did not work...

website %>% html_nodes("js-bestRate-show") %>% html_text()

website %>%
  html_nodes(xpath = "js-bestRate-show") %>%
  html_nodes(xpath = "//div") %>%
  html_nodes(xpath = "//span") %>%
  html_nodes(xpath = "//data-abc")

Anyone knows how html_nodes and Rvest work?


回答1:


The node is span with class js-bestRate-show. Everything else is an attribute. So you want something like:

library(rvest)
h <- '<span class="js-bestRate-show" data-crid="11232895" data-id="928723" data-abc="0602524361510" data-referecenceta="44205406" data-catalog="1">'

h %>% 
  read_html() %>% 
  html_nodes("span.js-bestRate-show") %>% 
  html_attr("data-abc")


来源:https://stackoverflow.com/questions/48633708/rvest-html-nodes-span-div-and-xpath

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!