问题
I am currently working on an application where I scrape information from a number of different sites. To get the deeplink for the desired topic on a site I rely on the sitemap that is provided (e.g. "Forum"). As I am expanding I came across some sites that don't provide a sitemap themselves, so I was wondering if there was any way to generate it within Rails from the top level domain?
I am using Nokogiri and Mechanize to retrieve data, so if there is any functionality that could help to tackle that task it would be easier to integrate.
回答1:
This can be done with the Spidr gem like so:
url_map = Hash.new { |hash,key| hash[key] = [] }
Spidr.site('http://intranet.com/') do |spider|
spider.every_link do |origin,dest|
url_map[dest] << origin
end
end
来源:https://stackoverflow.com/questions/21268998/create-dynamic-sitemap-from-url-with-ruby-on-rails