Node reading file in specified chunk size

问题

The goal: Upload large files to AWS Glacier without holding the whole file in memory.

I'm currently uploading to glacier now using fs.readFileSync() and things are working. But, I need to handle files larger than 4GB and I'd like to upload multiple chunks in parallel. This means moving to multipart uploads. I can choose the chunk size but then glacier needs every chunk to be the same size (except the last)

This thread suggests that I can set a chunk size on a read stream but that I'm not actually guaranteed to get it.

Any info on how I can get consistent parts without reading the whole file into memory and splitting it up manually?

Assuming I can get to that point I was just going to use cluster with a few processes pulling off the stream as fast as they can upload to AWS. If that seems like the wrong way to parallelize the work I'd love suggestions there.

回答1:

If nothing else you can just use fs.open(), fs.read(), and fs.close() manually. Example:

var CHUNK_SIZE = 10 * 1024 * 1024, // 10MB
    buffer = Buffer.alloc(CHUNK_SIZE),
    filePath = '/tmp/foo';

fs.open(filePath, 'r', function(err, fd) {
  if (err) throw err;
  function readNextChunk() {
    fs.read(fd, buffer, 0, CHUNK_SIZE, null, function(err, nread) {
      if (err) throw err;

      if (nread === 0) {
        // done reading file, do any necessary finalization steps

        fs.close(fd, function(err) {
          if (err) throw err;
        });
        return;
      }

      var data;
      if (nread < CHUNK_SIZE)
        data = buffer.slice(0, nread);
      else
        data = buffer;

      // do something with `data`, then call `readNextChunk();`
    });
  }
  readNextChunk();
});

来源：https://stackoverflow.com/questions/25110983/node-reading-file-in-specified-chunk-size

标签

node.js

upload

amazon-glacier