问题
I am trying to pull individual ratings from Glassdoor (the API only provides summary ratings) using the rvest package in R and SelectorGadget to identify my CSS selectors.
The problem is Glassdoor uses images to convey the ratings, but the numeric rating is contained in the image title. Using SelectorGadget, I can scrape the "Comp & Benefits" text from the code snippet below (using "#EmployerReviews undecorated li"), but I can't get to the "2.0" in the span...title= section, which is what I want.
<div id='EmployerReviews'> .... <ul class='undecorated'> <li> <div class='minor'>Comp & Benefits</div> <span class='notranslate notranslate_title gdBars gdRatings med ' title="2.0">
Anyone had success scraping image titles in the past, or know of another way to get these individual ratings?
回答1:
You will need to select the span, and use html_attr()
to extract its attribute value:
html <- html("...")
rating <- html %>%
html_nodes("#EmployerReviews .undecorated li span.gdRatings") %>%
html_attr("title")
rating
# [1] "2.0"
来源:https://stackoverflow.com/questions/28350833/scraping-image-titles-with-rvest