Scraping image titles with rvest

问题

I am trying to pull individual ratings from Glassdoor (the API only provides summary ratings) using the rvest package in R and SelectorGadget to identify my CSS selectors.

The problem is Glassdoor uses images to convey the ratings, but the numeric rating is contained in the image title. Using SelectorGadget, I can scrape the "Comp & Benefits" text from the code snippet below (using "#EmployerReviews undecorated li"), but I can't get to the "2.0" in the span...title= section, which is what I want.

<div id='EmployerReviews'> .... <ul class='undecorated'> <li> <div class='minor'>Comp & Benefits</div> <span class='notranslate notranslate_title gdBars gdRatings med ' title="2.0">

Anyone had success scraping image titles in the past, or know of another way to get these individual ratings?

回答1:

You will need to select the span, and use html_attr() to extract its attribute value:

html <- html("...")
rating <- html %>% 
  html_nodes("#EmployerReviews .undecorated li span.gdRatings") %>%
  html_attr("title")

rating
# [1] "2.0"

来源：https://stackoverflow.com/questions/28350833/scraping-image-titles-with-rvest

标签

css-selectors

rvest

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!