Cleaning XML document recursively from empty tags with Nokogiri?

南笙酒味 提交于 2019-12-11 07:59:42

问题


I have a nested XML document that looks like this:

<?xml version="1.0"?>
<phone>
  <name>test</name>
  <descr>description</descr>
  <empty/>
  <lines>
    <line>12345</line>
    <css/>
  </lines>
</phone>

I need to remove all empty XML nodes, like <empty/> and <css/>.

I ended up with something like:

doc = Nokogiri::XML::DocumentFragment.parse <<-EOXML
<phone>
  <name>test</name>
  <descr>description</descr>
  <empty/>
  <lines>
    <line>12345</line>
    <css/>
  </lines>
</phone>
EOXML

phone = doc.css("phone")
phone.children.each do | child |
    child.remove if child.inner_text == ''
end

The above code removes only the first empty tag, e.g. <empty/>. I'm not able to go inside the nested block. I think I need some recursive strategy here. I carefully read the Nokogiri documentation and checked a lot of examples but I didn't find a solution yet.

How can I fix this?

I'm using Ruby 1.9.3 and Nokogiri 1.5.10.


回答1:


You should be able find all nodes without any text using the xpath "/phone//*[not(text())]".

require 'nokogiri'

doc = Nokogiri::XML::Document.parse <<-EOXML
<phone>
  <name>test</name>
  <descr>description</descr>
  <empty/>
  <lines>
    <line>12345</line>
    <css/>
  </lines>
</phone>
EOXML

doc.xpath("/phone//*[not(text())]").remove

puts doc.to_s.gsub(/\n\s*\n/, "\n")
#=> <?xml version="1.0"?>
#=> <phone>
#=>   <name>test</name>
#=>   <descr>description</descr>
#=>   <lines>
#=>     <line>12345</line>
#=>   </lines>
#=> </phone>



回答2:


A latecomer with a different approach, hoping to add additional insight. This approach removes the annoying extra new lines and gives you the option to keep the empty fields that have attributes with values set.

require 'nokogiri'

doc = Nokogiri::XML::Document.parse <<-EOXML
<phone>
  <name>test</name>
  <descr>description</descr>
  <empty/>
  <lines>
    <line>12345</line>
    <css/>
  </lines>
</phone>
EOXML

def traverse_and_clean(kid)
  kid.children.map { |child| traverse_and_clean(child) }
  kid.remove if kid.content.blank?
end

traverse_and_clean(doc)

Output

<?xml version="1.0"?>
<phone>
  <name>test</name>
  <descr>description</descr>
  <lines>
    <line>12345</line>
  </lines>
</phone>

If you find yourself in a peculiar case needing to keep some empty fields that have certain attributes set. All you have to do is slightly change the traverse_and_clean method:

def traverse_and_clean(kid)
  kid.children.map { |child| traverse_and_clean(child) }
  kid.remove if kid.content.blank? && kid.attributes.blank?
end



回答3:


require 'nokogiri'

doc = Nokogiri::XML::Document.parse <<-EOXML
<phone>
  <name>test</name>
  <descr>description</descr>
  <empty/>
  <lines>
    <line>12345</line>
    <css/>
  </lines>
</phone>
EOXML

nodes = doc.xpath("//phone//*[not(text())]")

nodes.each{|n| n.remove if n.elem? }

puts doc

output

<?xml version="1.0"?>
<phone>
  <name>test</name>
  <descr>description</descr>

  <lines>
    <line>12345</line>

  </lines>
</phone>



回答4:


Similar to @JustinKo's answer only using CSS selectors:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<?xml version="1.0"?>
<phone>
  <name>test</name>
  <descr>description</descr>
  <empty/>
  <lines>
    <line>12345</line>
    <css/>
  </lines>
</phone>
EOT

doc.search(':empty').remove
puts doc.to_xml

Looking at what it did:

<?xml version="1.0"?>
<phone>
  <name>test</name>
  <descr>description</descr>

  <lines>
    <line>12345</line>

  </lines>
</phone>

Nokogiri implements a lot of jQuery's selectors, so it's always worth looking to see what those extensions can do.



来源:https://stackoverflow.com/questions/20123176/cleaning-xml-document-recursively-from-empty-tags-with-nokogiri

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!