Remove duplicate records from mongodb 4.0

青春壹個敷衍的年華 提交于 2021-02-08 06:22:29

问题


i have a collection which has duplicate records. I am using mongodb 4.0. How do i remove the duplicate records from the entire collection?

the record are getting inserted with the following structure { item: "journal", qty: 25, size:15 , status: "A" }

All i need is to have unique records for one document.


回答1:


You can group duplicated records using aggregation pipeline:

db.theCollection.aggregate([
   {$group: {_id: {item: "$item", qty: "$qty", size: "$size", status: "$status"}}},
   {$project: {_id: 0, item: "$_id.item", qty: "$_id.qty", size: "$_id.size", status: "$_id.status"}},
   {$out: "theCollectionWithoutDuplicates"}
])

After the execution of aggregation pipeline, the theCollectionWithoutDuplicates collection contains a document for each group of original duplicated documents, with a new _id - you can verify the output, removing original collection (db.theCollection.drop()) and rename the new collection (db.theCollectionWithoutDuplicates.renameCollection('theCollection')). Drop and rename can be combined in db.theCollectionWithoutDuplicates.renameCollection('theCollection', true).

EXPLANATION of aggregation pipeline usage:

  1. db.theCollection.aggregate([]) executes an aggregation pipeline, receiving a list of aggregation stages to be executed
  2. the $group stage groups document by fields specified as subsequent _id field
  3. the $project stage changes field names, flattening nested _id subdocuments produced by $group
  4. the $out stage stores aggregation resulting documents into given collection



回答2:


You can remove duplicated records using forEach:

db.collection.find({}, { item: 1, qty: 1, size: 1, status: 1 }).forEach(function(doc) {
    db.collection.remove({_id: { $gt: doc._id }, item: doc.item, qty: doc.qty, size: doc.size, status: doc.status })
})



回答3:


I recently create a code to delete duplicated documents from MongoDB, this should work:

const query = [
  {
    $group: {
      _id: {
        field: "$field",
      },
      dups: {
        $addToSet: "$_id",
      },
      count: {
        $sum: 1,
      },
    },
  },
  {
    $match: {
      count: {
      $gt: 1,
      },
    },
  },
];

const cursor = collection.aggregate(query).cursor({ batchSize: 10 }).exec();

cursor.eachAsync((doc, i) => {
  doc.dups.shift(); // First element skipped for deleting
  doc.dups.map(async (dupId) => {
    await collection.findByIdAndDelete({ _id: dupId });
  });
});


来源:https://stackoverflow.com/questions/54517837/remove-duplicate-records-from-mongodb-4-0

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!