Rvest returning null values

人走茶凉 提交于 2021-01-07 02:59:50

问题


I am trying to piece together how rvest is used, and I thought I'd got it but all the results I receive are null.

I am using @RonakShah 's example (Loop with rvest) as my base example and thought I'd try and expand to instead collect the name, telephone and hours open each day:

site = "https://concreteplayground.com/auckland/bars/archie-brothers-cirque-electriq"

get_phone <- function(url) {
  webpage <- site %>% read_html()
name <- webpage %>% html_nodes('p.name') %>%html_text() %>% trimws()
  telephone <- webpage %>% html_nodes('p.telephone') %>%html_text() %>% trimws()
  monday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  tuesday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  wednesday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  thursday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  friday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  saturday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  sunday <- webpage %>% html_nodes('p.day a') %>%html_text() %>% trimws()
  data.frame(telephone, monday, tuesday, wednesday, thursday, friday, saturday, sunday)
}

get_phone(site)

But I can't get any of these to work individually? I can't even get it to read the day in or the incorrect phone number. Would someone help point out why?


回答1:


Right click on the webpage, select Inspect and check the HMTL of the webpage. Find the element that you want to extract and use CSS selectors to scrape it.

library(rvest)
site <- "https://concreteplayground.com/auckland/bars/archie-brothers-cirque-electriq"

get_phone <- function(url) {
  webpage <- site %>% read_html()
  phone <- webpage %>% html_nodes('span[itemprop="telephone"]') %>% html_text()
  opening_hours <- webpage %>% 
                    html_nodes('div.open-hours') %>% 
                    html_attr('data-times') %>% jsonlite::fromJSON()
  list(phone_number = phone, opening_hours = opening_hours)
}

get_phone(site)


#$phone_number
#[1] "+64 800 888 386"

#$opening_hours
#  weekday time_from time_to
#1       1     12:00   00:00
#2       2     12:00   00:00
#3       3     12:00   00:00
#4       4     12:00   00:00
#5       5     12:00   00:00
#6       6     10:00   00:00
#7       0     10:00   00:00

Opening hours are stored in a json file which is helpful so we don't have to individually scrape them and bind them together.



来源:https://stackoverflow.com/questions/63007915/rvest-returning-null-values

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!