问题
Basically I want to use mechanize to go through all the pages from a-z on this site http://www.tv.com/shows/sort/a_z/
then, for each letter get the title of every show on all the pages for the letter "a". At the moment I am just trying to get it to work with the letter "a". This is what I have so far but don't know where to go from here?
require 'mechanize'
agent=Mechanize.new
goog = agent.get "http://www.tv.com/shows/sort/a_z/"
search = goog.link_with(:href => "/shows/sort/a/").click
回答1:
You just need to use some XPath to find content you need and navigate.
require 'mechanize'
shows = Array.new
agent = Mechanize.new
agent.get 'http://www.tv.com/shows/sort/a_z/'
agent.page.search('//div[@class="alphabet"]//li[not(contains(@class, "selected"))]/a').each do |letter_link|
agent.get letter_link[:href]
agent.page.search('//li[@class="show"]/a').each { |show_link| shows << show_link.text }
while next_page_link = agent.page.at('//div[@class="_pagination"]//a[@class="next"]') do
agent.get next_page_link[:href]
agent.page.search('//li[@class="show"]/a').each { |show_link| shows << show_link.text }
end
end
require 'pp'
pp shows
来源:https://stackoverflow.com/questions/23732239/get-mechanize-to-go-through-x-amounts-of-links-and-get-all-the-titles