Web scraping of image

一个人想着一个人 提交于 2019-12-21 06:17:09

问题


I am a beginner.

I created a small code to web scraping with rvest. I found a very convenient code %>% html_node ()%>% html_text ()%>% as.numeric (), but I was not able to correctly change the code for scraping url of image.

My code for web scraping url of image:

UrlPage <- html ("http://eyeonhousing.org/2012/11/gdp-growth-in-the-third-quarter-improved-but-still-slow/")

img <- UrlPage%>% html_node (". wp-image-5984")%>% html_attrs ()

Result:

class "Aligncenter size-full wp-image-5984" `enter code here`title "Blog gdp 2012_10_1" alt '" src "Http://eyeonhousing.files.wordpress.com/2012/11/blog-gdp-2012_10_1.jpg" height "337" width "450"

Question. How to get the only link without other attributes? (only )

Please help me find a solution. Thank you!


回答1:


You need to specify which attribute you want to extract as a parameter for html_attr. Also, you may want to make your CSS selector, the parameter for html_node, more specific. Here is my code:

library(rvest)

UrlPage <- html ("http://eyeonhousing.org/2012/11/gdp-growth-in-the-third-quarter-improved-but-still-slow/")
ImgNode <- UrlPage %>% html_node("img.wp-image-5984")
link <- html_attr(ImgNode, "src")

The link variable now contains the URL.

You can find a decent reference for css selectors here: http://www.w3schools.com/cssref/css_selectors.asp

Also the rvest documentation has some good examples on how to use its functions: http://cran.r-project.org/web/packages/rvest/rvest.pdf




回答2:


klib is right. just updated html (deprecated) to read_html and added a download command.

library(rvest)    

myurl <- read_html ("http://eyeonhousing.org/2012/11/gdp-growth-in-the-third-quarter-improved-but-still-slow/")
mynode <- myurl %>% html_node("img.wp-image-5984")
link <- html_attr(mynode, "src")
download.file(url = link,destfile = "test.jpg")


来源:https://stackoverflow.com/questions/30693476/web-scraping-of-image

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!