MongoDB - what is the fastest way to update all records in a collection?

前端 未结 5 1621
野性不改
野性不改 2020-12-07 18:43

I have a collection with 9 million records. I am currently using the following script to update the entire collection:

simple_update.js

db.my         


        
5条回答
  •  轮回少年
    2020-12-07 19:25

    I won't recommend using {multi: true} for a larger data set, because it is less configurable.

    A better way using bulk insert.

    Bulk operation is really helpful for scheduler tasks. Say you have to delete data older that 6 months daily. Use bulk operation. Its fast and won't slow down server. The CPU, memory usage is not noticeable when you do insert, delete or update over a billion documents. I found {multi:true} slowing down the server when you are dealing with million+ documents(require more research in this.)

    See a sample below. It's a js shell script, can run it in server as a node program as well.(use npm module shelljs or similar to achieve this)

    update mongo to 3.2+

    The normal way of updating multiple unique document is

    let counter = 0;
    db.myCol.find({}).sort({$natural:1}).limit(1000000).forEach(function(document){
        counter++;
        document.test_value = "just testing" + counter
        db.myCol.save(document)
    });
    

    It took 310-315 seconds when I tried. That's more than 5 minutes for updating a million documents.

    My collection includes 100 million+ documents, so speed may differ for others.

    The same using bulk insert is

        let counter = 0;
    // magic no.- depends on your hardware and document size. - my document size is around 1.5kb-2kb
    // performance reduces when this limit is not in 1500-2500 range.
    // try different range and find fastest bulk limit for your document size or take an average.
    let limitNo = 2222; 
    let bulk = db.myCol.initializeUnorderedBulkOp();
    let noOfDocsToProcess = 1000000;
    db.myCol.find({}).sort({$natural:1}).limit(noOfDocsToProcess).forEach(function(document){
        counter++;
        noOfDocsToProcess --;
        limitNo--;
        bulk.find({_id:document._id}).update({$set:{test_value : "just testing .. " + counter}});
        if(limitNo === 0 || noOfDocsToProcess === 0){
            bulk.execute();
            bulk = db.myCol.initializeUnorderedBulkOp();
            limitNo = 2222;
        }
    });
    

    The best time was 8972 millis. So in average it took only 10 seconds to update a million documents. 30 times faster than old way.

    Put the code in a .js file and execute as mongo shell script.

    If someone found a better way, please update. Lets use mongo in a faster way.

提交回复
热议问题