How do I parse an HTML table with Nokogiri?

后端 未结 1 899
渐次进展
渐次进展 2020-11-30 22:54

I installed Ruby and Mechanize. It seems to me that it is posible in Nokogiri to do what I want to do but I do not know how to do it.

What about this table

相关标签:
1条回答
  • 2020-11-30 23:36
    #!/usr/bin/ruby1.8
    
    require 'nokogiri'
    require 'pp'
    
    html = <<-EOS
      (The HTML from the question goes here)
    EOS
    
    doc = Nokogiri::HTML(html)
    rows = doc.xpath('//table/tbody[@id="threadbits_forum_251"]/tr')
    details = rows.collect do |row|
      detail = {}
      [
        [:title, 'td[3]/div[1]/a/text()'],
        [:name, 'td[3]/div[2]/span/a/text()'],
        [:date, 'td[4]/text()'],
        [:time, 'td[4]/span/text()'],
        [:number, 'td[5]/a/text()'],
        [:views, 'td[6]/text()'],
      ].each do |name, xpath|
        detail[name] = row.at_xpath(xpath).to_s.strip
      end
      detail
    end
    pp details
    
    # => [{:time=>"23:35",
    # =>   :title=>"Vb4 Gold Released",
    # =>   :number=>"24",
    # =>   :date=>"06 Jan 2010",
    # =>   :views=>"1,320",
    # =>   :name=>"Paul M"}]
    
    0 讨论(0)
提交回复
热议问题