问题
The goal: Upload large files to AWS Glacier without holding the whole file in memory.
I'm currently uploading to glacier now using fs.readFileSync() and things are working. But, I need to handle files larger than 4GB and I'd like to upload multiple chunks in parallel. This means moving to multipart uploads. I can choose the chunk size but then glacier needs every chunk to be the same size (except the last)
This thread suggests that I can set a chunk size on a read stream but that I'm not actually guaranteed to get it.
Any info on how I can get consistent parts without reading the whole file into memory and splitting it up manually?
Assuming I can get to that point I was just going to use cluster with a few processes pulling off the stream as fast as they can upload to AWS. If that seems like the wrong way to parallelize the work I'd love suggestions there.
回答1:
If nothing else you can just use fs.open(), fs.read(), and fs.close() manually. Example:
var CHUNK_SIZE = 10 * 1024 * 1024, // 10MB
buffer = Buffer.alloc(CHUNK_SIZE),
filePath = '/tmp/foo';
fs.open(filePath, 'r', function(err, fd) {
if (err) throw err;
function readNextChunk() {
fs.read(fd, buffer, 0, CHUNK_SIZE, null, function(err, nread) {
if (err) throw err;
if (nread === 0) {
// done reading file, do any necessary finalization steps
fs.close(fd, function(err) {
if (err) throw err;
});
return;
}
var data;
if (nread < CHUNK_SIZE)
data = buffer.slice(0, nread);
else
data = buffer;
// do something with `data`, then call `readNextChunk();`
});
}
readNextChunk();
});
来源:https://stackoverflow.com/questions/25110983/node-reading-file-in-specified-chunk-size