get links while do web scraping to google in R

旧时模样 提交于 2019-12-11 04:28:43

问题


I am trying to get links of google while do a search, that is, all these links:.

I have done this kind of scraping but in this case I do not understand why It doesn't work, so I run the following lines:

library(rvest)
url<-"https://www.google.es/search?q=Ediciones+Peña+sl+telefono"
content_request<-read_html(url)
content_request %>%
    html_nodes(".r") %>%
    html_attr("href")

I have tried with other nodes and I obtain similar answers:

content_request %>%
    html_nodes(".LC20lb") %>%
    html_attr("href")

Finally I tried to get all the links of the web page, but there are some links that I cannot download:

html_attr(html_nodes(content_request, "a"), "href")

Please, could you help me in this case? Thank you.


回答1:


Here are two options for you to play around with.

#1) 

url <- "https://www.google.es/search?q=Ediciones+Pe%C3%B1a+sl+telefono"
html <- paste(readLines(url), collapse="\n")
library(stringr)
matched <- str_match_all(html, "<a href=\"(.*?)\"")


#2) 

library(xml2)
library(rvest)
URL <- "https://www.google.es/search?q=Ediciones+Pe%C3%B1a+sl+telefono"
pg <- read_html(URL)
head(html_attr(html_nodes(pg, "a"), "href"))


来源:https://stackoverflow.com/questions/54884611/get-links-while-do-web-scraping-to-google-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!