Get “data from collection b not in collection a” in a MongoDB shell query

后端 未结 4 1987
执笔经年
执笔经年 2020-12-23 14:06

I have two MongoDB collections that share a common _id. Using the mongo shell, I want to find all documents in one collection that do not have a matching _id in the other co

相关标签:
4条回答
  • 2020-12-23 14:34

    In mongo 3.2 the following code seems to work

    db.collectionb.aggregate([
        {
          $lookup:
            {
              from: "collectiona",
              localField: "collectionb_fk",
              foreignField: "collectiona_fk",
              as: "matched_docs"
            }
       },
       {
          $match: { "matched_docs": { $eq: [] } }
       }
    ]);
    

    based on this https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/#use-lookup-with-an-array example

    0 讨论(0)
  • 2020-12-23 14:36

    Answering your follow-up. I'd use map().

    Given this:

    > b1 = {i: 1}
    > db.b.save(b1)
    > db.b.save({i: 2})
    > db.a.save({_id: b1._id})
    

    All you need is:

    > vals = db.a.find({}, {id: 1}).map(function(a){return a._id;})
    > db.b.find({_id: {$nin: vals}})
    

    which returns

    { "_id" : ObjectId("4f08c60d6b5e49fa3f6b46c1"), "i" : 2 }
    
    0 讨论(0)
  • 2020-12-23 14:43

    You will have to save the _ids from collection A to not pull them again from collection B, but you can do it using $nin. See Advanced Queries for all of the MongoDB operators.

    Your end query, using the example you gave would look something like:

    db.Test.find({"_id": {"$nin": [ObjectId("4f08a75f306b428fb9d8bb2e"), 
     ObjectId("4f08a766306b428fb9d8bb2f")]}})`
    

    Note that this approach won't scale. If you need a solution that scales, you should be setting a flag in collections A and B indicating if the _id is in the other collection and then query off of that instead.

    Updated for second part:

    The second part is impossible. MongoDB does not support joins or any sort of cross querying between collections in a single query. Querying from one collection, saving the results and then querying from the second is your only choice unless you embed the data in the rows themselves as I mention earlier.

    0 讨论(0)
  • 2020-12-23 14:58

    I've made a script, marking all documents on the second collection that appears in first collection. Then processed the second collection documents.

    var first = db.firstCollection.aggregate([ {'$unwind':'$secondCollectionField'} ])
    
    while (first.hasNext()){ var doc = first.next(); db.secondCollection.update( {_id:doc.secondCollectionField} ,{$set:{firstCollectionField:doc._id}} ); }
    

    ...process the second collection that has no mark

    db.secondCollection.find({"firstCollectionField":{$exists:false}})
    
    0 讨论(0)
提交回复
热议问题