Parsing Large XML with Nokogiri

后端 未结 4 1281
忘了有多久
忘了有多久 2021-01-07 10:10

So I\'m attempting to parse a 400k+ line XML file using Nokogiri.

The XML file has this basic format:



        
4条回答
  •  刺人心
    刺人心 (楼主)
    2021-01-07 10:47

    You can also use Nokogiri::XML::Reader. It's more memory intensive that Nokogiri::XML::SAX parser but you can keep XML structure, e.x.

    class NodeHandler < Struct.new(:node)
      def process
        # Node processing logic
        #e.x.
        signId = node.at('ClinicalSign').attribute('id').text()      
        name = node.at('ClinicalSign').element_children().text()
    
      end
    end
    
    
    Nokogiri::XML::Reader(File.open('./test/fixtures/example.xml')).each do |node|
      if node.name == 'DisorderSign' && node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT
        NodeHandler.new(
            Nokogiri::XML(node.outer_xml).at('./DisorderSign')
        ).process
      end
    end
    

    Based on this blog

提交回复
热议问题