NodeJS parseStream, defining a start and end point for a chunk

ε祈祈猫儿з 提交于 2019-12-04 14:26:57

You have two possibilities to tackle your issue.

As stated by damphat, XML2JS needs the full XML content before it can parse the data. But you have a file stream, which, well, streams data chunk by chunks. The first solution is to convert this stream of data into a nice big Buffer, and then send it to XML2JS. For this purpose, you can use the stream-to package (npm i stream-to) which will convert the file stream into an array of buffers, which we'll then concatenate into one single buffer using Buffer.concat, like this:

var fs = require('fs')
var streamTo = require('stream-to')
var xml2js = require('xml2js')

var file = fs.createReadStream('input.xml')

streamTo.array(file, function (err, arr) {
    if (err) return console.log(err.message)

    var content = Buffer.concat(arr)
    var parser = new xml2js.Parser()
    parser.parseString(content, function (err, res) {
        if (err) return console.log(err.message)
        console.log(res.merchandiser.product)
    })
})

This works quite well, but since it needs to hold the full file into memory, it won't work if your input files are really big. To handle really big files, you need to use a streaming XML parser, such as sax. However sax doesn't create Javascript objects, but is an EventEmitter, and is a bit harder to use since you have to handle all relevant events to build your object on the fly.

You can use for instance the SaXPath library, which supports a small subset of the XPath syntax. This library emits a match event every time it matches the XPath pattern. Here's an example:

var saxpath = require('saxpath')
var fs = require('fs')
var sax = require('sax')

var saxParser = sax.createStream(true)
var streamer = new saxpath.SaXPath(saxParser, '/merchandiser/product')

streamer.on('match', function(xml) {
    console.log(xml);
});

fs.createReadStream('input.xml').pipe(saxParser)

You then have two options:

  1. Since you now have the XML that matches only one product at a time, you can use xml2js to parse a single product at a time
  2. SaXPath supports multiple recorders: the default recorder listens to sax events and re-creates the corresponding XML (which is what allowed us to use the first solution), but you can roll out your own recorder, that listens to sax events and creates on the fly javascript objects.

xml2js is for full loaded xml.

In your case using sax, it is a stream parser:

// install

npm install sax

// this code is for print all product_id

var fs = require('fs');
var sax = require('sax');

var saxStream = sax.createStream();

saxStream.onopentag = function (node) {
    if(node.name === 'PRODUCT'){
        console.log(node.attributes.PRODUCT_ID);
    }
};

fs.createReadStream('xml/bigXML.xml').pipe(saxStream);

ouput:

52863929
26537849
25535647
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!