问题
i have a collection which has duplicate records. I am using mongodb 4.0. How do i remove the duplicate records from the entire collection?
the record are getting inserted with the following structure { item: "journal", qty: 25, size:15 , status: "A" }
All i need is to have unique records for one document.
回答1:
You can group duplicated records using aggregation pipeline:
db.theCollection.aggregate([
{$group: {_id: {item: "$item", qty: "$qty", size: "$size", status: "$status"}}},
{$project: {_id: 0, item: "$_id.item", qty: "$_id.qty", size: "$_id.size", status: "$_id.status"}},
{$out: "theCollectionWithoutDuplicates"}
])
After the execution of aggregation pipeline, the theCollectionWithoutDuplicates collection contains a document for each group of original duplicated documents, with a new _id - you can verify the output, removing original collection (db.theCollection.drop()) and rename the new collection (db.theCollectionWithoutDuplicates.renameCollection('theCollection')). Drop and rename can be combined in db.theCollectionWithoutDuplicates.renameCollection('theCollection', true).
EXPLANATION of aggregation pipeline usage:
db.theCollection.aggregate([])executes an aggregation pipeline, receiving a list of aggregation stages to be executed- the
$groupstage groups document by fields specified as subsequent_idfield - the
$projectstage changes field names, flattening nested_idsubdocuments produced by$group - the
$outstage stores aggregation resulting documents into given collection
回答2:
You can remove duplicated records using forEach:
db.collection.find({}, { item: 1, qty: 1, size: 1, status: 1 }).forEach(function(doc) {
db.collection.remove({_id: { $gt: doc._id }, item: doc.item, qty: doc.qty, size: doc.size, status: doc.status })
})
回答3:
I recently create a code to delete duplicated documents from MongoDB, this should work:
const query = [
{
$group: {
_id: {
field: "$field",
},
dups: {
$addToSet: "$_id",
},
count: {
$sum: 1,
},
},
},
{
$match: {
count: {
$gt: 1,
},
},
},
];
const cursor = collection.aggregate(query).cursor({ batchSize: 10 }).exec();
cursor.eachAsync((doc, i) => {
doc.dups.shift(); // First element skipped for deleting
doc.dups.map(async (dupId) => {
await collection.findByIdAndDelete({ _id: dupId });
});
});
来源:https://stackoverflow.com/questions/54517837/remove-duplicate-records-from-mongodb-4-0