问题
I have a nested XML document that looks like this:
<?xml version="1.0"?>
<phone>
<name>test</name>
<descr>description</descr>
<empty/>
<lines>
<line>12345</line>
<css/>
</lines>
</phone>
I need to remove all empty XML nodes, like <empty/>
and <css/>
.
I ended up with something like:
doc = Nokogiri::XML::DocumentFragment.parse <<-EOXML
<phone>
<name>test</name>
<descr>description</descr>
<empty/>
<lines>
<line>12345</line>
<css/>
</lines>
</phone>
EOXML
phone = doc.css("phone")
phone.children.each do | child |
child.remove if child.inner_text == ''
end
The above code removes only the first empty tag, e.g. <empty/>
. I'm not able to go inside the nested block. I think I need some recursive strategy here. I carefully read the Nokogiri documentation and checked a lot of examples but I didn't find a solution yet.
How can I fix this?
I'm using Ruby 1.9.3 and Nokogiri 1.5.10.
回答1:
You should be able find all nodes without any text using the xpath "/phone//*[not(text())]"
.
require 'nokogiri'
doc = Nokogiri::XML::Document.parse <<-EOXML
<phone>
<name>test</name>
<descr>description</descr>
<empty/>
<lines>
<line>12345</line>
<css/>
</lines>
</phone>
EOXML
doc.xpath("/phone//*[not(text())]").remove
puts doc.to_s.gsub(/\n\s*\n/, "\n")
#=> <?xml version="1.0"?>
#=> <phone>
#=> <name>test</name>
#=> <descr>description</descr>
#=> <lines>
#=> <line>12345</line>
#=> </lines>
#=> </phone>
回答2:
A latecomer with a different approach, hoping to add additional insight. This approach removes the annoying extra new lines and gives you the option to keep the empty fields that have attributes with values set.
require 'nokogiri'
doc = Nokogiri::XML::Document.parse <<-EOXML
<phone>
<name>test</name>
<descr>description</descr>
<empty/>
<lines>
<line>12345</line>
<css/>
</lines>
</phone>
EOXML
def traverse_and_clean(kid)
kid.children.map { |child| traverse_and_clean(child) }
kid.remove if kid.content.blank?
end
traverse_and_clean(doc)
Output
<?xml version="1.0"?>
<phone>
<name>test</name>
<descr>description</descr>
<lines>
<line>12345</line>
</lines>
</phone>
If you find yourself in a peculiar case needing to keep some empty fields that have certain attributes set. All you have to do is slightly change the traverse_and_clean
method:
def traverse_and_clean(kid)
kid.children.map { |child| traverse_and_clean(child) }
kid.remove if kid.content.blank? && kid.attributes.blank?
end
回答3:
require 'nokogiri'
doc = Nokogiri::XML::Document.parse <<-EOXML
<phone>
<name>test</name>
<descr>description</descr>
<empty/>
<lines>
<line>12345</line>
<css/>
</lines>
</phone>
EOXML
nodes = doc.xpath("//phone//*[not(text())]")
nodes.each{|n| n.remove if n.elem? }
puts doc
output
<?xml version="1.0"?>
<phone>
<name>test</name>
<descr>description</descr>
<lines>
<line>12345</line>
</lines>
</phone>
回答4:
Similar to @JustinKo's answer only using CSS selectors:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0"?>
<phone>
<name>test</name>
<descr>description</descr>
<empty/>
<lines>
<line>12345</line>
<css/>
</lines>
</phone>
EOT
doc.search(':empty').remove
puts doc.to_xml
Looking at what it did:
<?xml version="1.0"?>
<phone>
<name>test</name>
<descr>description</descr>
<lines>
<line>12345</line>
</lines>
</phone>
Nokogiri implements a lot of jQuery's selectors, so it's always worth looking to see what those extensions can do.
来源:https://stackoverflow.com/questions/20123176/cleaning-xml-document-recursively-from-empty-tags-with-nokogiri