问题
How do I get a <td>
with a specific class name using XPath and Nokogiri? Tables are nested and some of them don't have IDs or classes, so I can't nest stuff like this:
//table/tbody/tr/td
Here is what I have so far:
doc = Nokogiri::HTML(open("http://www.goalzz.com/default.aspx?c=8358"))
doc.xpath('//td[@class="m_g"]').each do |node|
pp node.to_s
end
Any ideas? There are few <td>
s with that class name and I want to get all of them.
回答1:
Using gem "capybara-webkit" is a viable way of manipulating this website in full javascript rendered view.
Here is a scratch example of what a capybara-webkit script might look like.
#!/usr/bin/env ruby
require "rubygems"
require "pp"
require "bundler/setup"
require "capybara"
require "capybara/dsl"
require "capybara-webkit"
Capybara.run_server = false
Capybara.current_driver = :webkit
Capybara.app_host = "http://www.goalzz.com/"
module Test
class Goalzz
include Capybara::DSL
def get_results
visit('/default.aspx?c=8358')
all(:xpath, '//td[@class="m_g"]').each { |node| pp node.to_s }
end
end
end
spider = Test::Goalzz.new
spider.get_results
What is required to find the example xpath in this case (due to the page being created dynamically), is a fully functional javascript webdriving engine.
回答2:
Are the class attributes on these td
s exactly "m_g", or do they have more than one class on a single td
? If it's the latter, this XPath might work:
//td[contains(@class, "m_g")]
来源:https://stackoverflow.com/questions/14263532/how-do-i-access-html-elements-that-are-rendered-in-javascript-using-xpath