How do I use Nokogiri::XML::Reader to parse large XML files?

生来就可爱ヽ(ⅴ<●) 提交于 2019-11-30 07:06:06

Each element in the stream comes through as two events: one to open the element and one to close it. The opening event will have

node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT

and the closing event will have

node.node_type == Nokogiri::XML::Reader::TYPE_END_ELEMENT

The empty strings you're seeing are just the element closing events. Remember that with SAX parsing, you're basically walking through a tree so you need the second event to tell you when you're going back up and closing an element.

You probably want something more like this:

reader.each do |node|
  if node.name == "PMID" && node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT
    p << node.inner_xml
  end
end

Or perhaps:

reader.each do |node|
  next if node.name      != 'PMID'
  next if node.node_type != Nokogiri::XML::Reader::TYPE_ELEMENT
  p << node.inner_xml
end

Or some other variation on that.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!