How can I use a cursor.forEach() in MongoDB using Node.js?

前端 未结 10 816
你的背包
你的背包 2020-11-27 14:04

I have a huge collection of documents in my DB and I\'m wondering how can I run through all the documents and update them, each document with a different value.

10条回答
  •  再見小時候
    2020-11-27 14:49

    I looked for a solution with good performance and I end up creating a mix of what I found which I think works good:

    /**
     * This method will read the documents from the cursor in batches and invoke the callback
     * for each batch in parallel.
     * IT IS VERY RECOMMENDED TO CREATE THE CURSOR TO AN OPTION OF BATCH SIZE THAT WILL MATCH
     * THE VALUE OF batchSize. This way the performance benefits are maxed out since
     * the mongo instance will send into our process memory the same number of documents
     * that we handle in concurrent each time, so no memory space is wasted
     * and also the memory usage is limited.
     *
     * Example of usage:
     * const cursor = await collection.aggregate([
         {...}, ...],
         {
            cursor: {batchSize: BATCH_SIZE} // Limiting memory use
        });
     DbUtil.concurrentCursorBatchProcessing(cursor, BATCH_SIZE, async (doc) => ...)
     * @param cursor - A cursor to batch process on.
     * We can get this from our collection.js API by either using aggregateCursor/findCursor
     * @param batchSize - The batch size, should match the batchSize of the cursor option.
     * @param callback - Callback that should be async, will be called in parallel for each batch.
     * @return {Promise}
     */
    static async concurrentCursorBatchProcessing(cursor, batchSize, callback) {
        let doc;
        const docsBatch = [];
    
        while ((doc = await cursor.next())) {
            docsBatch.push(doc);
    
            if (docsBatch.length >= batchSize) {
                await PromiseUtils.concurrentPromiseAll(docsBatch, async (currDoc) => {
                    return callback(currDoc);
                });
    
                // Emptying the batch array
                docsBatch.splice(0, docsBatch.length);
            }
        }
    
        // Checking if there is a last batch remaining since it was small than batchSize
        if (docsBatch.length > 0) {
            await PromiseUtils.concurrentPromiseAll(docsBatch, async (currDoc) => {
                return callback(currDoc);
            });
        }
    }
    

    An example of usage for reading many big documents and updating them:

            const cursor = await collection.aggregate([
            {
                ...
            }
        ], {
            cursor: {batchSize: BATCH_SIZE}, // Limiting memory use 
            allowDiskUse: true
        });
    
        const bulkUpdates = [];
    
        await DbUtil.concurrentCursorBatchProcessing(cursor, BATCH_SIZE, async (doc: any) => {
            const update: any = {
                updateOne: {
                    filter: {
                        ...
                    },
                    update: {
                       ...
                    }
                }
            };            
    
            bulkUpdates.push(update);
    
            // Updating if we read too many docs to clear space in memory
            await this.bulkWriteIfNeeded(bulkUpdates, collection);
        });
    
        // Making sure we updated everything
        await this.bulkWriteIfNeeded(bulkUpdates, collection, true);
    

    ...

        private async bulkWriteParametersIfNeeded(
        bulkUpdates: any[], collection: any,
        forceUpdate = false, flushBatchSize) {
    
        if (bulkUpdates.length >= flushBatchSize || forceUpdate) {
            // concurrentPromiseChunked is a method that loops over an array in a concurrent way using lodash.chunk and Promise.map
            await PromiseUtils.concurrentPromiseChunked(bulkUpsertParameters, (upsertChunk: any) => {
                return techniquesParametersCollection.bulkWrite(upsertChunk);
            });
    
            // Emptying the array
            bulkUpsertParameters.splice(0, bulkUpsertParameters.length);
        }
    }
    

提交回复
热议问题