Parse XML nodes to CSV with Ruby/Nokogiri

吃可爱长大的小学妹 提交于 2019-12-03 22:08:26

问题


Ruby parsing newbie here.

I've got an XML file that looks like;

?xml version="1.0" encoding="iso-8859-1"?>
<Offers xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://ssc.channeladvisor.com/files/cageneric.xsd">
  <Offer>
   <Model><![CDATA[11016001]]></Model>
   <Manufacturer><![CDATA[Crocs, Inc.]]></Manufacturer>
   <ManufacturerModel><![CDATA[11016-001]]></ManufacturerModel>
   ...lots more nodes
   <Custom6><![CDATA[<li>Bold midsole stripe for a sporty look.</li>
    <li>Odor-resistant, easy to clean, and quick to dry.</li>
    <li>Ventilation ports for enhanced breathability.</li>
    <li>Lightweight, non-marking soles.</li>
    <li>Water-friendly and buoyant; weighs only ounces.</li>
    <li>Fully molded Croslite&trade; material for lightweight cushioning and comfort.</li>
    <li>Heel strap swings back for snug fit, forward for wear as a clog.</li>]]></Custom6>
  </Offer>
....lots lots more <Offer> entries
</Offers>

What I want to do is parse each instance of 'Offer' into its own row in a CSV which I'm doing via this code:

require 'csv'
require 'nokogiri'

file = File.read('input.xml')
doc = Nokogiri::XML(file)
a = []
csv = CSV.open('output.csv', 'wb') 

doc.css('Offer').each do |node|
    a.push << node.content.split
end

a.each { |a| csv << a } 

Which runs nicely (once I figured that the CSV needed an array fed to it which putting .split onto the node.content for 'a' seemed to satisfy).

My issue is I'm splitting on whitespace rather than each element of the Offer node (sorry if that's not the right terminology?) and so every word is going into it's own column in the csv.

Has anyone got some pointers as to;

  1. A way to pick up the content of each node
  2. How to pull the node names through as headers in the csv

Any pointers much appreciated

Thanks, Liam


回答1:


This assumes that each Offer element always has the same child nodes (though they can be empty):

CSV.open('output.csv', 'wb') do |csv|
  doc.search('Offer').each do |x|
    csv << x.search('*').map(&:text)
  end
end

And to get headers (from the first Offer element):

CSV.open('output.csv', 'wb') do |csv|
  csv << doc.at('Offer').search('*').map(&:name)
  doc.search('Offer').each do |x|
    csv << x.search('*').map(&:text)
  end
end

EDIT

search and at are Nokogiri functions that can take either XPath or CSS selector strings. at will return the first occurrence of an element; search will provide an array of matching elements (or an empty array if no matches are found). The * in this case will select all nodes that are direct children of the current node.

Both name and text are also Nokogiri functions (for an element). name provides the element's name; text provides the text or CDATA content of a node.




回答2:


Try this, and modify it to push into your csv:

doc.css('Offer').first.elements.each do |n|
  puts "#{n.name}: #{n.content}"
end


来源:https://stackoverflow.com/questions/21852766/parse-xml-nodes-to-csv-with-ruby-nokogiri

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!