Using rvest to scrape data that is not in table

柔情痞子 提交于 2020-08-26 10:17:07

问题


I'm trying to scrape some data from a website. I thought I could use rvest, but I'm having a lot of trouble getting data that is not in a table.

I don't know if it's possible, or whether I'm using the wrong package?

I am trying to get the website, name and address from the following html:

<div class="info clearfix">
<i class="sprite icon title"></i>
<p class="title">
<a target="_blank" href="https://test.com/regions/Tennis_Court.html">
Tennis Court</a>
</p>
<p class="location"> 123 Page St, Charlestown</p>                                                <p class="excerpt" itemprop="description">A place to play tennis</p>                                                                                           </div>

I'd hoped I could use something like html_node("title") etc, but that doesn't seem to wrong. Am I completely on the wrong path?


回答1:


You can use html_nodes to add css selectors to extract :

library(rvest)
url <- 'https://concreteplayground.com/auckland/bars'

webpage <- url %>% read_html()
name <- webpage %>% html_nodes('p.name a') %>%html_text() %>% trimws()
address <- webpage %>% html_nodes('p.address') %>% html_text() %>% trimws()
links <- webpage %>% html_nodes('p.name a') %>% html_attr('href')
data.frame(name, address, links)

#                              name                                address
#1                         Holy Hop          498 New North Road, Kingsland
#2                              Sly          354A Karangahape Road, Newton
#...
#...

                                                                      
#                                                                 links
#1                         https://concreteplayground.com/auckland/bars/holy-hop
#2                              https://concreteplayground.com/auckland/bars/sly
#...
#...


来源:https://stackoverflow.com/questions/62928540/using-rvest-to-scrape-data-that-is-not-in-table

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!