Need a distinct count on multiple fields that were joined from another collection using mongodb aggregation query

风流意气都作罢 提交于 2019-12-24 00:48:45

问题


I'm trying to use a mongodb aggregation query to join($lookup) two collections and then distinct count all the unique values in the joined array.

So my two collections look like this: events-

{
    "_id" : "1",
    "name" : "event1",
    "objectsIds" : [ "1", "2", "3" ],
}

Objects

{
    "_id" : "1",
    "name" : "object1",
    "metaDataMap" : { 
                         "SOURCE" : ["ABC", "DEF"],
                         "DESTINATION" : ["XYZ", "PDQ"],
                         "TYPE" : []
                    }
},
{
    "_id" : "2",
    "name" : "object2",
    "metaDataMap" : { 
                         "SOURCE" : ["RST", "LNE"],
                         "TYPE" : ["text"]
                    }
},
{
    "_id" : "3",
    "name" : "object3",
    "metaDataMap" : { 
                         "SOURCE" : ["NOP"],
                         "DESTINATION" : ["PHI", "NYC"],
                         "TYPE" : ["video"]
                    }
}

What I want to come out is when I do a $match on event _id=1 I want to join the metaDataMap and then distinct count all the keys like this: Counts for event _id=1

SOURCE : 5
DESTINATION: 4
TYPE: 2

What I have so far is this:

db.events.aggregate([
 {$match: {"_id" : id}}
,{$lookup: {"from" : "objects",
            "localField" : "objectsIds",
            "foreignField" : "_id",
            "as" : "objectResults"}}
,{$project: {x: {$objectToArray: "$objectResults.metaDataMap"}}}
,{$unwind: "$x"}
,{$match: {"x.k": {$ne: "_id"}}}
,{$group: {_id: "$x.k", y: {$addToSet: "$x.v"}}}
,{$addFields: {size: {"$size":"$y"}} }
]);

This fails because $objectResults.metaDataMap is not an object it's an array. Any suggestions on how to solve this or a different way to do what I want to do? Also I don't necessarily know what fields(keys) are in the metaDataMap array. And I don't want to count or include fields that might or might not exist in the Map.


回答1:


This should do the trick. I tested it on your input set and deliberately added some dupe values like NYCshowing up in more than one DESTINATIONto ensure it got de-duped (i.e. distinct count as asked for). For fun, comment out all the stages, then top down UNcomment it out to see the effect of each stage of the pipeline.

var id = "1";

c=db.foo.aggregate([
// Find a thing:
{$match: {"_id" : id}}

// Do the lookup into the objects collection:
,{$lookup: {"from" : "foo2",
            "localField" : "objectsIds",
            "foreignField" : "_id",
            "as" : "objectResults"}}

// OK, so we've got a bunch of extra material now.  Let's
// get down to just the metaDataMap:
,{$project: {x: "$objectResults.metaDataMap"}}
,{$unwind: "$x"}
,{$project: {"_id":0}}

// Use $objectToArray to get all the field names dynamically:
// Replace the old x with new x (don't need the old one):
,{$project: {x: {$objectToArray: "$x"}}}
,{$unwind: "$x"}

// Collect unique field names.  Interesting note: the values
// here are ARRAYS, not scalars, so $push is creating an
// array of arrays:
,{$group: {_id: "$x.k", tmp: {$push: "$x.v"}}}

// Almost there!  We have to turn the array of array (of string)
// into a single array which we'll subsequently dedupe.  We will
// overwrite the old tmp with a new one, too:
,{$addFields: {tmp: {$reduce:{
    input: "$tmp",
    initialValue:[],
    in:{$concatArrays: [ "$$value", "$$this"]}
        }}
    }}

// Now just unwind and regroup using the addToSet operator
// to dedupe the list:
,{$unwind: "$tmp"}
,{$group: {_id: "$_id", uniqueVals: {$addToSet: "$tmp"}}}

// Add size for good measure:
,{$addFields: {size: {"$size":"$uniqueVals"}} }
          ]);



回答2:


I was able to generate required result using following query.

db.events.aggregate(
   [  
      {$match: {"_id" : id}} ,
      {$lookup: {
          "from" : "objects",
          "localField" : "objectsIds",
          "foreignField" : "_id",
          "as" : "objectResults"
      }},
      {$unwind: "$objectResults"},
      {$project:{"A":"$objectResults.metaDataMap"}},
      {$unwind: {path: "$A.SOURCE", preserveNullAndEmptyArrays: true}},
      {$unwind:{ path: "$A.DESTINATION", preserveNullAndEmptyArrays: true}},
      {$unwind:{ path: "$A.TYPE", preserveNullAndEmptyArrays: true}},
      {$group:{"_id":"$_id","SOURCE":{$addToSet:"$A.SOURCE"},"DESTINATION":{$addToSet:"$A.DESTINATION"},"TYPE":{$addToSet:"$A.TYPE"}}},
      {$addFields: {"SOURCE":{$size:"$SOURCE"},"DESTINATION":{$size:"$DESTINATION"},"TYPE":{$size:"$TYPE"}}},
      {$project:{"_id":0}}]
).pretty()

Updated query for dynamic fields.

db.events.aggregate([  
{
$match: {"_id" : id}} ,
{$lookup: {"from" : "objects","localField" : "objectsIds","foreignField" : "_id","as" : "objectResults"}},
{$unwind: "$objectResults"},
{$project:{"A":"$objectResults.metaDataMap"}},
{$project: {x: {$objectToArray: "$A"}}},
{$unwind: "$x"},
{$match: {"x.k": {$ne: "_id"}}},
{$unwind:"$x.v"},
{$group: {_id: "$x.k", y: {$addToSet: "$x.v"}}},
{$project:{"size":{$size:"$y"}}}]
).pretty()


来源:https://stackoverflow.com/questions/46591045/need-a-distinct-count-on-multiple-fields-that-were-joined-from-another-collectio

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!