Parse XML nodes to CSV with Ruby/Nokogiri

问题

Ruby parsing newbie here.

I've got an XML file that looks like;

?xml version="1.0" encoding="iso-8859-1"?>
<Offers xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ssc.channeladvisor.com/files/cageneric.xsd">
  <Offer>
   <Model><![CDATA[11016001]]></Model>
   <Manufacturer><![CDATA[Crocs, Inc.]]></Manufacturer>
   <ManufacturerModel><![CDATA[11016-001]]></ManufacturerModel>
   ...lots more nodes
   <Custom6><![CDATA[<li>Bold midsole stripe for a sporty look.</li>
    <li>Odor-resistant, easy to clean, and quick to dry.</li>
    <li>Ventilation ports for enhanced breathability.</li>
    <li>Lightweight, non-marking soles.</li>
    <li>Water-friendly and buoyant; weighs only ounces.</li>
    <li>Fully molded Croslite&trade; material for lightweight cushioning and comfort.</li>
    <li>Heel strap swings back for snug fit, forward for wear as a clog.</li>]]></Custom6>
  </Offer>
....lots lots more <Offer> entries
</Offers>

What I want to do is parse each instance of 'Offer' into its own row in a CSV which I'm doing via this code:

require 'csv'
require 'nokogiri'

file = File.read('input.xml')
doc = Nokogiri::XML(file)
a = []
csv = CSV.open('output.csv', 'wb') 

doc.css('Offer').each do |node|
    a.push << node.content.split
end

a.each { |a| csv << a }

Which runs nicely (once I figured that the CSV needed an array fed to it which putting .split onto the node.content for 'a' seemed to satisfy).

My issue is I'm splitting on whitespace rather than each element of the Offer node (sorry if that's not the right terminology?) and so every word is going into it's own column in the csv.

Has anyone got some pointers as to;

A way to pick up the content of each node
How to pull the node names through as headers in the csv

Any pointers much appreciated

Thanks, Liam

回答1:

This assumes that each Offer element always has the same child nodes (though they can be empty):

CSV.open('output.csv', 'wb') do |csv|
  doc.search('Offer').each do |x|
    csv << x.search('*').map(&:text)
  end
end

And to get headers (from the first Offer element):

CSV.open('output.csv', 'wb') do |csv|
  csv << doc.at('Offer').search('*').map(&:name)
  doc.search('Offer').each do |x|
    csv << x.search('*').map(&:text)
  end
end

EDIT

search and at are Nokogiri functions that can take either XPath or CSS selector strings. at will return the first occurrence of an element; search will provide an array of matching elements (or an empty array if no matches are found). The * in this case will select all nodes that are direct children of the current node.

Both name and text are also Nokogiri functions (for an element). name provides the element's name; text provides the text or CDATA content of a node.

回答2:

Try this, and modify it to push into your csv:

doc.css('Offer').first.elements.each do |n|
  puts "#{n.name}: #{n.content}"
end

来源：https://stackoverflow.com/questions/21852766/parse-xml-nodes-to-csv-with-ruby-nokogiri

标签

ruby

xml

csv

nokogiri