问题
Ruby parsing newbie here.
I've got an XML file that looks like;
?xml version="1.0" encoding="iso-8859-1"?>
<Offers xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ssc.channeladvisor.com/files/cageneric.xsd">
<Offer>
<Model><![CDATA[11016001]]></Model>
<Manufacturer><![CDATA[Crocs, Inc.]]></Manufacturer>
<ManufacturerModel><![CDATA[11016-001]]></ManufacturerModel>
...lots more nodes
<Custom6><![CDATA[<li>Bold midsole stripe for a sporty look.</li>
<li>Odor-resistant, easy to clean, and quick to dry.</li>
<li>Ventilation ports for enhanced breathability.</li>
<li>Lightweight, non-marking soles.</li>
<li>Water-friendly and buoyant; weighs only ounces.</li>
<li>Fully molded Croslite™ material for lightweight cushioning and comfort.</li>
<li>Heel strap swings back for snug fit, forward for wear as a clog.</li>]]></Custom6>
</Offer>
....lots lots more <Offer> entries
</Offers>
What I want to do is parse each instance of 'Offer' into its own row in a CSV which I'm doing via this code:
require 'csv'
require 'nokogiri'
file = File.read('input.xml')
doc = Nokogiri::XML(file)
a = []
csv = CSV.open('output.csv', 'wb')
doc.css('Offer').each do |node|
a.push << node.content.split
end
a.each { |a| csv << a }
Which runs nicely (once I figured that the CSV needed an array fed to it which putting .split onto the node.content for 'a' seemed to satisfy).
My issue is I'm splitting on whitespace rather than each element of the Offer node (sorry if that's not the right terminology?) and so every word is going into it's own column in the csv.
Has anyone got some pointers as to;
- A way to pick up the content of each node
- How to pull the node names through as headers in the csv
Any pointers much appreciated
Thanks, Liam
回答1:
This assumes that each Offer
element always has the same child nodes (though they can be empty):
CSV.open('output.csv', 'wb') do |csv|
doc.search('Offer').each do |x|
csv << x.search('*').map(&:text)
end
end
And to get headers (from the first Offer
element):
CSV.open('output.csv', 'wb') do |csv|
csv << doc.at('Offer').search('*').map(&:name)
doc.search('Offer').each do |x|
csv << x.search('*').map(&:text)
end
end
EDIT
search
and at
are Nokogiri functions that can take either XPath or CSS selector strings. at
will return the first occurrence of an element; search
will provide an array of matching elements (or an empty array if no matches are found). The *
in this case will select all nodes that are direct children of the current node.
Both name
and text
are also Nokogiri functions (for an element). name
provides the element's name; text
provides the text or CDATA content of a node.
回答2:
Try this, and modify it to push into your csv:
doc.css('Offer').first.elements.each do |n|
puts "#{n.name}: #{n.content}"
end
来源:https://stackoverflow.com/questions/21852766/parse-xml-nodes-to-csv-with-ruby-nokogiri