Web-Scraping in R programming (rvest)

痴心易碎 提交于 2020-02-23 06:28:29

问题


I am trying to scrape all details (Type Of Traveller, Seat Type,Route,Date Flown, Seat Comfort, Cabin Staff Service, Food & Beverages, Inflight Entertainment,Ground Service,Wifi & Connectivity,Value For Money) inclusive of the star rating

from the airline quality webpage

https://www.airlinequality.com/airline-reviews/emirates/

Not Working as expected

my_url<- c("https://www.airlinequality.com/airline-reviews/emirates/")

review <- function(url){
    review<- read_html(url) %>%
    html_nodes(".review-value") %>%
    html_text%>%
    as_tibble()   
   }
output <- map_dfr(my_url, review )

Only able to scrape star rating , I need to have the all details (e.g Cabin Staff Service - rating 2 , Food & Beverages = rating 5)

star <- function(url){ 
  stars_sq <- read_html(url) %>%
    html_nodes(".star") %>%
    html_attr("class") %>%
    as.factor() %>%
    as_tibble()
}

output_star<- map_dfr(my_url, star )

The output of the result should be in a table form :

column : Type Of Traveller , Seat Type,Route,Date Flown, Seat Comfort .... with the star rating
row : each reviews


回答1:


It's a little involved because you need to tabulate the filled/unfilled stars to get the rating for each field. I would use html_table() to help, then re-insert the calculated star values:

require(tibble)
require(purrr)
require(rvest)

my_url <- c("https://www.airlinequality.com/airline-reviews/emirates/")

count_stars_in_cell <- function(cell)
{
  html_children(cell) %>% 
  html_attr("class")  %>%
  equals("star fill") %>% 
  which               %>% 
  length
}

get_ratings_each_review <- function(review) 
{
  review                             %>%
  html_nodes(".review-rating-stars") %>%
  lapply(count_stars_in_cell)        %>%
  unlist
}

all_tables <- read_html(my_url)      %>%
              html_nodes("table")

reviews <- lapply(all_tables, html_table)

ratings <- lapply(all_tables, get_ratings_each_review)

for (i in seq_along(reviews))
{
  reviews[[i]]$X2[reviews[[i]]$X2 == "12345"] <- ratings[[i]]
}

print(reviews)

This gives you a list with one table for each review. These should be straightforward to combine into a single data frame.



来源:https://stackoverflow.com/questions/59342737/web-scraping-in-r-programming-rvest

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!