Image scraping in Ruby

烈酒焚心 提交于 2020-02-22 07:02:30

问题


How do I scrape an image present on a particular URL using Nokogiri? If there are better options than Nokogiri please suggest. The css image tag is .profilePic img


回答1:


If it is just an <img> with a URL:

PAGE = "http://site.com/page.html"
require 'nokogiri'
require 'open-uri'
html = Nokogiri.HTML(open(PAGE))
src  = html.at('.profilePic img')['src']
File.open("foo.png", "wb") do |f|
  f.write(open(src).read)
end

If you need to turn a relative image path into an absolute, see:
https://stackoverflow.com/a/4864170/405017




回答2:


The lazy way is to use mechanize as it will figure out the urls and filenames for you:

require 'mechanize'
agent = Mechanize.new
doc = agent.get(url)
agent.get(doc.parser.at('.profilePic img')['src']).save


来源:https://stackoverflow.com/questions/8956249/image-scraping-in-ruby

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!