Convert a Nokogiri document to a Ruby Hash

◇◆丶佛笑我妖孽 提交于 2019-11-26 18:33:28

I use this code with libxml-ruby (1.1.3). I have not used nokogiri myself, but I understand that it uses libxml-ruby anyway. I would also encourage you to look at ROXML (http://github.com/Empact/roxml/tree) which maps xml elements to ruby objects; it is built atop libxml.

# USAGE: Hash.from_libxml(YOUR_XML_STRING)
require 'xml/libxml'
# adapted from 
# http://movesonrails.com/articles/2008/02/25/libxml-for-active-resource-2-0

class Hash 
  class << self
        def from_libxml(xml, strict=true) 
          begin
            XML.default_load_external_dtd = false
            XML.default_pedantic_parser = strict
            result = XML::Parser.string(xml).parse 
            return { result.root.name.to_s => xml_node_to_hash(result.root)} 
          rescue Exception => e
            # raise your custom exception here
          end
        end 

        def xml_node_to_hash(node) 
          # If we are at the root of the document, start the hash 
          if node.element? 
           if node.children? 
              result_hash = {} 

              node.each_child do |child| 
                result = xml_node_to_hash(child) 

                if child.name == "text"
                  if !child.next? and !child.prev?
                    return result
                  end
                elsif result_hash[child.name.to_sym]
                    if result_hash[child.name.to_sym].is_a?(Object::Array)
                      result_hash[child.name.to_sym] << result
                    else
                      result_hash[child.name.to_sym] = [result_hash[child.name.to_sym]] << result
                    end
                  else 
                    result_hash[child.name.to_sym] = result
                  end
                end

              return result_hash 
            else 
              return nil 
           end 
           else 
            return node.content.to_s 
          end 
        end          
    end
end
Guillaume Roderick

If you want to convert a Nokogiri XML document to a hash, just do the following:

require 'active_support/core_ext/hash/conversions'
hash = Hash.from_xml(nokogiri_document.to_s)

Here's a far simpler version that creates a robust Hash that includes namespace information, both for elements and attributes:

require 'nokogiri'
class Nokogiri::XML::Node
  TYPENAMES = {1=>'element',2=>'attribute',3=>'text',4=>'cdata',8=>'comment'}
  def to_hash
    {kind:TYPENAMES[node_type],name:name}.tap do |h|
      h.merge! nshref:namespace.href, nsprefix:namespace.prefix if namespace
      h.merge! text:text
      h.merge! attr:attribute_nodes.map(&:to_hash) if element?
      h.merge! kids:children.map(&:to_hash) if element?
    end
  end
end
class Nokogiri::XML::Document
  def to_hash; root.to_hash; end
end

Seen in action:

xml = '<r a="b" xmlns:z="foo"><z:a>Hello <b z:m="n" x="y">World</b>!</z:a></r>'
doc = Nokogiri::XML(xml)
p doc.to_hash
#=> {
#=>   :kind=>"element",
#=>   :name=>"r",
#=>   :text=>"Hello World!",
#=>   :attr=>[
#=>     {
#=>       :kind=>"attribute",
#=>       :name=>"a", 
#=>       :text=>"b"
#=>     }
#=>   ], 
#=>   :kids=>[
#=>     {
#=>       :kind=>"element", 
#=>       :name=>"a", 
#=>       :nshref=>"foo", 
#=>       :nsprefix=>"z", 
#=>       :text=>"Hello World!", 
#=>       :attr=>[], 
#=>       :kids=>[
#=>         {
#=>           :kind=>"text", 
#=>           :name=>"text", 
#=>           :text=>"Hello "
#=>         },
#=>         {
#=>           :kind=>"element", 
#=>           :name=>"b", 
#=>           :text=>"World", 
#=>           :attr=>[
#=>             {
#=>               :kind=>"attribute", 
#=>               :name=>"m", 
#=>               :nshref=>"foo", 
#=>               :nsprefix=>"z", 
#=>               :text=>"n"
#=>             },
#=>             {
#=>               :kind=>"attribute", 
#=>               :name=>"x", 
#=>               :text=>"y"
#=>             }
#=>           ], 
#=>           :kids=>[
#=>             {
#=>               :kind=>"text", 
#=>               :name=>"text", 
#=>               :text=>"World"
#=>             }
#=>           ]
#=>         },
#=>         {
#=>           :kind=>"text", 
#=>           :name=>"text", 
#=>           :text=>"!"
#=>         }
#=>       ]
#=>     }
#=>   ]
#=> }
John Hinnegan

I found this while trying to simply convert XML to Hash (not in Rails). I was thinking I would use Nokogiri, but ended up going with Nori.

Then my code was trival:

response_hash = Nori.parse(response)

Other users have pointed out that this does not work. I have not verified, but it seems that the parse method has been moved from the class to the instance. My code above worked at some point. New (unverified) code would be:

response_hash = Nori.new.parse(response)
PythonDev

Use Nokogiri to parse XML response to ruby hash. It's pretty fast.

doc = Nokogiri::XML(response_body) 
Hash.from_xml(doc.to_s)
Pierre Schambacher

If you define something like this in your configuration:

ActiveSupport::XmlMini.backend = 'Nokogiri'

it includes a module in Nokogiri and you gain the to_hash method.

juanfe

If the node you've selected in Nokogiri consists of only one tag, you can extract the keys, values and zip them into one hash, like so:

  @doc ||= Nokogiri::XML(File.read("myxmldoc.xml"))
  @node = @doc.at('#uniqueID') # this works if this selects only one node
  nodeHash = Hash[*@node.keys().zip(@node.values()).flatten]

See http://www.ruby-forum.com/topic/125944 for more info on Ruby array merging.

Have a look at the simple mix-in I made for the Nokogiri XML Node.

http://github.com/kuroir/Nokogiri-to-Hash

Here's a usage example:

require 'rubygems'
require 'nokogiri'
require 'nokogiri_to_hash'
html = '
  <div id="hello" class="container">
    <p>Hello! visit my site <a href="http://kuroir.com">Kuroir.com</a></p>
  </div>
'
p Nokogiri.HTML(html).to_hash
=> [{:div=>{:class=>["container"], :children=>[{:p=>{:children=>[{:a=>{:href=>["http://kuroir.com"], :children=>[]}}]}}], :id=>["hello"]}}]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!