mongo 3 duplicates on unique index - dropDups

前端 未结 3 1021
刺人心
刺人心 2020-12-13 10:32

In the documentation for mongoDB it says: \"Changed in version 3.0: The dropDups option is no longer available.\"

Is there anything I can do (other than downgrading)

3条回答
  •  庸人自扰
    2020-12-13 10:55

    As highlighted by @Maxime-Beugnet you can create a batch script to remove duplicates from a collection. I have included my approach below that is relatively fast if the number of duplicates are small in comparison to the collection size. For demonstration purposes this script will de-duplicate the collection created by the following script:

    db.numbers.drop()
    
    var counter = 0
    while (counter<=100000){
      db.numbers.save({"value":counter})
      db.numbers.save({"value":counter})
      if (counter % 2 ==0){
        db.numbers.save({"value":counter})
      }
      counter = counter + 1;
    }
    

    You can remove the duplicates in this collection by writing an aggregate query that returns all records with more than one duplicate.

    var cur = db.numbers.aggregate([{ $group: { _id: { value: "$value" }, uniqueIds: { $addToSet: "$_id" }, count: { $sum: 1 } } }, { $match: { count: { $gt: 1 } } }]);
    

    Using the cursor you can then iterate over the duplicate records and implement your own business logic to decide which of the duplicates to remove. In the example below I am simply keeping the first occurrence:

    while (cur.hasNext()) {
        var doc = cur.next();
        var index = 1;
        while (index < doc.uniqueIds.length) {
            db.numbers.remove(doc.uniqueIds[index]);
            index = index + 1;
        }
    }
    

    After removal of the duplicates you can add an unique index:

    db.numbers.createIndex( {"value":1},{unique:true})
    

提交回复
热议问题