Read a file one line at a time in node.js?

前端未结

关注

 29  1601

深忆病人 2020-11-22 04:33

I am trying to read a large file one line at a time. I found a question on Quora that dealt with the subject but I\'m missing some connections to make the whole thing fit to

29条回答

一向 (楼主)

2020-11-22 05:13

I wanted to tackle this same problem, basically what in Perl would be:

while (<>) {
    process_line($_);
}

My use case was just a standalone script, not a server, so synchronous was fine. These were my criteria:

The minimal synchronous code that could reuse in many projects.
No limits on file size or number of lines.
No limits on length of lines.
Able to handle full Unicode in UTF-8, including characters beyond the BMP.
Able to handle *nix and Windows line endings (old-style Mac not needed for me).
Line endings character(s) to be included in lines.
Able to handle last line with or without end-of-line characters.
Not use any external libraries not included in the node.js distribution.

This is a project for me to get a feel for low-level scripting type code in node.js and decide how viable it is as a replacement for other scripting languages like Perl.

After a surprising amount of effort and a couple of false starts this is the code I came up with. It's pretty fast but less trivial than I would've expected: (fork it on GitHub)

var fs            = require('fs'),
    StringDecoder = require('string_decoder').StringDecoder,
    util          = require('util');

function lineByLine(fd) {
  var blob = '';
  var blobStart = 0;
  var blobEnd = 0;

  var decoder = new StringDecoder('utf8');

  var CHUNK_SIZE = 16384;
  var chunk = new Buffer(CHUNK_SIZE);

  var eolPos = -1;
  var lastChunk = false;

  var moreLines = true;
  var readMore = true;

  // each line
  while (moreLines) {

    readMore = true;
    // append more chunks from the file onto the end of our blob of text until we have an EOL or EOF
    while (readMore) {

      // do we have a whole line? (with LF)
      eolPos = blob.indexOf('\n', blobStart);

      if (eolPos !== -1) {
        blobEnd = eolPos;
        readMore = false;

      // do we have the last line? (no LF)
      } else if (lastChunk) {
        blobEnd = blob.length;
        readMore = false;

      // otherwise read more
      } else {
        var bytesRead = fs.readSync(fd, chunk, 0, CHUNK_SIZE, null);

        lastChunk = bytesRead !== CHUNK_SIZE;

        blob += decoder.write(chunk.slice(0, bytesRead));
      }
    }

    if (blobStart < blob.length) {
      processLine(blob.substring(blobStart, blobEnd + 1));

      blobStart = blobEnd + 1;

      if (blobStart >= CHUNK_SIZE) {
        // blobStart is in characters, CHUNK_SIZE is in octets
        var freeable = blobStart / CHUNK_SIZE;

        // keep blob from growing indefinitely, not as deterministic as I'd like
        blob = blob.substring(CHUNK_SIZE);
        blobStart -= CHUNK_SIZE;
        blobEnd -= CHUNK_SIZE;
      }
    } else {
      moreLines = false;
    }
  }
}

It could probably be cleaned up further, it was the result of trial and error.

0 讨论(0)

查看其它29个回答