How can I get all links of a website using the Mechanize gem?

℡╲_俬逩灬. 提交于 2019-12-07 05:49:39

问题


How can i get all links of a website using ruby Mechanize gem? Does Mechanize can do like Anemone gem:

Anemone.crawl("https://www.google.com.vn/") do |anemone|
  anemone.on_every_page do |page|
    puts page.url
  end
end

I'm newbie in web crawler. Thanks in advance!


回答1:


It's quite simple with Mechanize, and I suggest you to read the documentation. You can start with Ruby BastardBook.

To get all links from a page with Mechanize try this:

require 'mechanize'

agent = Mechanize.new
page = agent.get("http://example.com")
page.links.each {|link| puts "#{link.text} => #{link.href}"}

The code is clear I think. page is a Mechanize::Page object that stores the whole content of the retrieved page. Mechanize::Page has the links method.

Mechanize is very powerful, but remember that if you want to do scraping without any interaction with the website use Nokogiri. Mechanize uses Nokogiri to scrap the web, so for scraping only use Nokogiri.



来源:https://stackoverflow.com/questions/25781236/how-can-i-get-all-links-of-a-website-using-the-mechanize-gem

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!