Parsing XML to hash with Nori and Nokogiri with undesired result

限于喜欢 提交于 2019-12-11 11:42:01

问题


I am attempting to convert an XML document to a Ruby hash using Nori. But instead of receiving a collection of the root element, a new node containing the collection is returned. This is what I am doing:

@xml  = content_for(:layout)
@hash = Nori.new(:parser => :nokogiri, :advanced_typecasting => false).parse(@xml)

or

@hash = Hash.from_xml(@xml)

Where the content of @xml is:

<bundles>
  <bundle>
    <id>6073</id>
    <name>Bundle-1</name>
    <status>1</status>
    <bundle_type>
      <id>6713</id>
      <name>BundleType-1</name>
    </bundle_type>
    <begin_at nil=\"true\"></begin_at>
    <end_at nil=\"true\"></end_at>
    <updated_at>2013-03-21T23:02:32Z</updated_at>
    <created_at>2013-03-21T23:02:32Z</created_at>
  </bundle>
  <bundle>
    <id>6074</id>
    <name>Bundle-2</name>
    <status>1</status>
    <bundle_type>
      <id>6714</id>
      <name>BundleType-2</name>
    </bundle_type>
    <begin_at nil=\"true\"></begin_at>
    <end_at nil=\"true\"></end_at>
    <updated_at>2013-03-21T23:02:32Z</updated_at>
    <created_at>2013-03-21T23:02:32Z</created_at>
  </bundle>
</bundles>

The parser returns @hash of format:

{"bundles"=>{"bundle"=>[{"id"=>"6073", "name"=>"Bundle-1", "status"=>"1", "bundle_type"=>{"id"=>"6713", "name"=>"BundleType-1"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}, {"id"=>"6074", "name"=>"Bundle-2", "status"=>"1", "bundle_type"=>{"id"=>"6714", "name"=>"BundleType-2"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}]}} 

Instead I would like to get:

{"bundles"=>[{"id"=>"6073", "name"=>"Bundle-1", "status"=>"1", "bundle_type"=>{"id"=>"6713", "name"=>"BundleType-1"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}, {"id"=>"6074", "name"=>"Bundle-2", "status"=>"1", "bundle_type"=>{"id"=>"6714", "name"=>"BundleType-2"}, "begin_at"=>nil, "end_at"=>nil, "updated_at"=>"2013-03-21T23:02:32Z", "created_at"=>"2013-03-21T23:02:32Z"}]}

The point is that I control the XML, where it if formed similar to the way described above.

My question is also related to Does RABL's JSON output not conform to standard? Can it?


回答1:


Imagine an XML that consists only of a list of the same tags, e.g.

<shoppinglist>
    <item>apple</item>
    <item>banana</item>
    <item>cherry</item>
    <item>pear</item>
<shoppinglist>

When you convert this into a hash, it is quite straightforward to access the items with e.g. hash['shoppinglist']['item'][0]. But what would you expect in this case? just an array? According to your logic, the items should now be accessible with hash['shoppinglist'][0] but what if you have different elements inside the container e.g.

<shoppinglist>
    <date>2013-01-01</date>
    <item>apple</item>
    <item>banana</item>
    <item>cherry</item>
    <item>pear</item>
<shoppinglist>

How would you now access the items? And how the date? The problem is that the conversion to a hash has to work in the general case.

Although i do not know Nori, i am pretty sure what you ask from it is not baked in, just because it makes no sense when you consider the general case. As an alternative, you can still get the bundle array up one level by yourself:

@hash['bundles'] = @hash['bundles']['bundle']



回答2:


The general solution to to your problem is not very pretty.

I created a special Object that I named an ArrayHash. It has the special property that if in has only one key and that value of the data pointed to by that key is an array it adds integer keys to those array elements.

So if normal ruby Hash dictionary would look like

{bundle"=>["0", "1", "A", "B"]}

then in an ArrayHash dictionaary would look like this

{"bundle"=>["0", "1", "A", "B"], 0=>"0", 1=>"1", 2=>"A", 3=>"B"}

Since the extra keys are of type Fixnum this Hash looks just like the Array

[ "0", "1", "A", "B" ]

except that it also has a "bundle" entry so its size is 5

Below is the code to force Nori to use this special Dictionary.

require 'nori'

class Nori
  class ArrayHash < Hash
    def [](a)
      if a.is_a? Fixnum and self.size == 1
        key = self.keys[0]
        self[key][a]
      else
        super
      end
    end
    def inspect
      if self.size == 1 and self.to_a[0][1].class == Array
        p = Hash[self.to_a]
        self.values[0].each.with_index do |v, i|
          p[i] = v
        end
        p.inspect
      else
        super
      end
    end
  end
end

class Nori
  class XMLUtilityNode
    alias :old_to_hash :to_hash
    def to_hash
      ret = old_to_hash
      raise if ret.size != 1
      raise unless ret.class == Hash
      a = ret.to_a[0]
      k, v = a.first, a.last
      if v.class == Hash
        v = ArrayHash[ v.to_a ]
      end
      ret = ArrayHash[ k, v ]
      ret
    end
  end
end


h = Nori.new(:parser => :nokogiri, :advanced_typecasting => false).parse(<<EOF)
<top>
<aundles>
  <bundle>0</bundle>
  <bundle>1</bundle>
  <bundle>A</bundle>
  <bundle>B</bundle>
</aundles>
<bundles>
  <nundle>A</nundle>
  <bundle>A</bundle>
  <bundle>B</bundle>
</bundles>
</top>
EOF

puts "#{h['top']['aundles'][0]} == #{ h['top']['aundles']['bundle'][0]}"
puts "#{h['top']['aundles'][1]} == #{ h['top']['aundles']['bundle'][1]}"
puts "#{h['top']['aundles'][2]} == #{ h['top']['aundles']['bundle'][2]}"
puts "#{h['top']['aundles'][3]} == #{ h['top']['aundles']['bundle'][3]}"

puts h.inspect

The output is then

0 == 0
1 == 1
A == A
B == B
{"top"=>{"aundles"=>{"bundle"=>["0", "1", "A", "B"], 0=>"0", 1=>"1", 2=>"A", 3=>"B"}, "bundles"=>{"nundle"=>"A", "bundle"=>["A", "B"]}}}


来源:https://stackoverflow.com/questions/15560182/parsing-xml-to-hash-with-nori-and-nokogiri-with-undesired-result

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!