How to remove duplicates based on a key in Mongodb?

后端 未结 8 827
伪装坚强ぢ
伪装坚强ぢ 2020-11-30 20:56

I have a collection in MongoDB where there are around (~3 million records). My sample record would look like,

 { \"_id\" = ObjectId(\"50731xxxxxxxxxxxxxxxxxx         


        
8条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-11-30 21:06

    Expanding on Fernando's answer, I found that it was taking too long, so I modified it.

    var x = 0;
    db.collection.distinct("field").forEach(fieldValue => {
      var i = 0;
      db.collection.find({ "field": fieldValue }).forEach(doc => {
        if (i) {
          db.collection.remove({ _id: doc._id });
        }
        i++;
        x += 1;
        if (x % 100 === 0) {
          print(x); // Every time we process 100 docs.
        }
      });
    });
    

    The improvement is basically using the document id for removing, which should be faster, and also adding the progress of the operation, you can change the iteration value to your desired amount.

    Also, indexing the field before the operation helps.

提交回复
热议问题