Get mechanize to go through x amounts of links and get all the titles?

我的未来我决定 提交于 2019-12-11 19:03:12

问题


Basically I want to use mechanize to go through all the pages from a-z on this site http://www.tv.com/shows/sort/a_z/

then, for each letter get the title of every show on all the pages for the letter "a". At the moment I am just trying to get it to work with the letter "a". This is what I have so far but don't know where to go from here?

require 'mechanize'

agent=Mechanize.new
goog = agent.get "http://www.tv.com/shows/sort/a_z/"
search = goog.link_with(:href => "/shows/sort/a/").click

回答1:


You just need to use some XPath to find content you need and navigate.

require 'mechanize'
shows = Array.new
agent = Mechanize.new
agent.get 'http://www.tv.com/shows/sort/a_z/'
agent.page.search('//div[@class="alphabet"]//li[not(contains(@class, "selected"))]/a').each do |letter_link|
  agent.get letter_link[:href]
  agent.page.search('//li[@class="show"]/a').each { |show_link| shows << show_link.text }

  while next_page_link = agent.page.at('//div[@class="_pagination"]//a[@class="next"]') do
    agent.get next_page_link[:href]
    agent.page.search('//li[@class="show"]/a').each { |show_link| shows << show_link.text }
  end
end

require 'pp'
pp shows


来源:https://stackoverflow.com/questions/23732239/get-mechanize-to-go-through-x-amounts-of-links-and-get-all-the-titles

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!