Inserting and deleting XML nodes and elements using Nokogiri

孤街醉人 提交于 2019-12-10 00:53:52

问题


I want to extract parts of an XML file and make a note that I extracted some part in that file, like "here something was extracted".

I'm trying to do this with Nokogiri, but it seems to not really be documented on how to:

  1. delete all childs of a <Nokogiri::XML::Element>
  2. change the inner_text of that complete element

Any clues?


回答1:


Nokogiri makes this pretty easy. Using this document as an example, the following code will find all vitamins tags, remove their children (and the children's children, etc.), and change their inner text to say "Children removed.":

require 'nokogiri'

io = File.open('sample.xml', 'r')
doc = Nokogiri::XML(io)
io.close

doc.search('//vitamins').each do |node|
  node.children.remove
  node.content = 'Children removed.'
end

A given food node will go from looking like this:

<food>
    <name>Avocado Dip</name>
    <mfr>Sunnydale</mfr>
    <serving units="g">29</serving>
    <calories total="110" fat="100"/>
    <total-fat>11</total-fat>
    <saturated-fat>3</saturated-fat>
    <cholesterol>5</cholesterol>
    <sodium>210</sodium>
    <carb>2</carb>
    <fiber>0</fiber>
    <protein>1</protein>
    <vitamins>
        <a>0</a>
        <c>0</c>
    </vitamins>
    <minerals>
        <ca>0</ca>
        <fe>0</fe>
    </minerals>
</food>

to this:

<food>
    <name>Avocado Dip</name>
    <mfr>Sunnydale</mfr>
    <serving units="g">29</serving>
    <calories total="110" fat="100"/>
    <total-fat>11</total-fat>
    <saturated-fat>3</saturated-fat>
    <cholesterol>5</cholesterol>
    <sodium>210</sodium>
    <carb>2</carb>
    <fiber>0</fiber>
    <protein>1</protein>
    <vitamins>Children removed.</vitamins>
    <minerals>
        <ca>0</ca>
        <fe>0</fe>
    </minerals>
</food>



回答2:


The previous Nokogiri example set me in the right direction, but using doc.search left a malformed //vitamins, so I used CSS:

require "rubygems"
require "nokogiri"

f = File.open("food.xml")
doc = Nokogiri::XML(f)

doc.css("food vitamins").each do |node|
  puts "\r\n[debug] Before: vitamins= \r\n#{node}"
  node.children.remove
  node.content = "Children removed"
  puts "\r\n[debug] After: vitamins=\r\n#{node}"
end
f.close

Which results in:

debug] Before: vitamins= 
<vitamins>
        <a>0</a>
        <c>0</c>
    </vitamins>

[debug] After: vitamins=
<vitamins>Children removed</vitamins>



回答3:


You can do it like this:

doc=Nokogiri::XML(your_document)
note=doc.search("note") # find all tags with the node_name "note"
note.remove

While that would remove all children within the <note> tag, I am not sure how to "change the inner_text" of all note elements. I think inner_text is not applicable for a Nokogiri::XML::Element.




回答4:


Here's what I'd do:

Parse some XML first:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="nutrition.css"?>
<nutrition>

  <daily-values>
    <total-fat units="g">65</total-fat>
    <saturated-fat units="g">20</saturated-fat>
    <cholesterol units="mg">300</cholesterol>
    <sodium units="mg">2400</sodium>
    <carb units="g">300</carb>
    <fiber units="g">25</fiber>
    <protein units="g">50</protein>
  </daily-values>

  <food>
    <name>Avocado Dip</name>
    <mfr>Sunnydale</mfr>
    <serving units="g">29</serving>
    <calories total="110" fat="100"/>
    <total-fat>11</total-fat>
    <saturated-fat>3</saturated-fat>
    <cholesterol>5</cholesterol>
    <sodium>210</sodium>
    <carb>2</carb>
    <fiber>0</fiber>
    <protein>1</protein>
    <vitamins>
      <a>0</a>
      <c>0</c>
    </vitamins>
    <minerals>
      <ca>0</ca>
      <fe>0</fe>
    </minerals>
  </food>

</nutrition>
EOT

If I want to delete a node's content, I can remove its children or assign nil to its content:

doc.at('total-fat').to_xml # => "<total-fat units=\"g\">65</total-fat>"
doc.at('total-fat').children.remove
doc.at('total-fat').to_xml # => "<total-fat units=\"g\"/>"

or:

doc.at('saturated-fat').to_xml # => "<saturated-fat units=\"g\">20</saturated-fat>"
doc.at('saturated-fat').content = nil
doc.at('saturated-fat').to_xml # => "<saturated-fat units=\"g\"/>"

If I want to extract the text from a node for use some other way:

food = doc.at('food').text
# => "\n    Avocado Dip\n    Sunnydale\n    29\n    \n    11\n    3\n    5\n    210\n    2\n    0\n    1\n    \n      0\n      0\n    \n    \n      0\n      0\n    \n  "

or:

food = doc.at('food').children.map(&:text)
# => ["\n    ",
#     "Avocado Dip",
#     "\n    ",
#     "Sunnydale",
#     "\n    ",
#     "29",
#     "\n    ",
#     "",
#     "\n    ",
#     "11",
#     "\n    ",
#     "3",
#     "\n    ",
#     "5",
#     "\n    ",
#     "210",
#     "\n    ",
#     "2",
#     "\n    ",
#     "0",
#     "\n    ",
#     "1",
#     "\n    ",
#     "\n      0\n      0\n    ",
#     "\n    ",
#     "\n      0\n      0\n    ",
#     "\n  "]

or however else you want to mangle the text.

And, if you want to mark that you've removed the text:

doc.at('food').content = 'REMOVED'
doc.at('food').to_xml # => "<food>REMOVED</food>"

You could also use an XML comment instead:

doc.at('food').children = '<!-- REMOVED -->'
doc.at('food').to_xml # => "<food>\n  <!-- REMOVED -->\n</food>"


来源:https://stackoverflow.com/questions/1274783/inserting-and-deleting-xml-nodes-and-elements-using-nokogiri

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!