How do I count multiple keys in the same MongoDB aggregation $group query?

Deadly 提交于 2019-12-21 04:21:21

问题


I have a query:

db.test.aggregate( {$group : { _id : '$key', frequency: { $sum : 1 } } } )

This will get the frequency of every enumeration of key in the test set. Basically, I have gotten the distribution of key.

Now imagine I want to get the distributions of key1, key2, and key3 (so three different distributions).

Obviously, I could run this query 3 times with each separate key, but it seems like we would be able to optimize the query by allowing it to count all 3 keys at the same time. I have been playing around with it and searching the whole of the inter-webs, but so far, I am consigned to running three separate aggregation queries or using a map/reduce function.

Does anyone have any other ideas?


回答1:


There are a few different approaches you could use here:

  1. Use map/reduce: don't do this. Right now it would be much faster to run the aggregation framework 3 times than to use a map reduce function for this use case.

  2. Run aggregation 3 times. This is not optimal, but if you don't have time constraints then this is the easiest option. If your aggregations are taking < a few seconds anyway then I wouldn't worry about optimizing until they become a problem.

  3. Here's the best work-around I can think of. The $group operator allows you to build an _id on multiple fields. E.g. {"_id":{"a":"$key1", "b":"$key2", "c":"$key3"}}. Doing this creates a grouping for all existing combinations of your different keys. You could potentially group you keys this way and then manually sum across the results in the client.

Let me elaborate. Let's say we have a collection of shapes. These shapes can have a color, a size, and a kind (square, circle, etc). An aggregation on a multi-key Id could look like:

db.shapes.aggregate({$group:{_id:{"f1":"$f1", "f2":"$f2", "f3":"$f3"}, count:{"$sum":1}}})

and return:

"result" : [
        {
            "_id" : {
                "f1" : "yellow",
                "f2" : "medium",
                "f3" : "triangle"
            },
            "count" : 4086
        },
        {
            "_id" : {
                "f1" : "red",
                "f2" : "small",
                "f3" : "triangle"
            },
            "count" : 4138
        },
        {
            "_id" : {
                "f1" : "red",
                "f2" : "big",
                "f3" : "square"
            },
            "count" : 4113
        },
        {
            "_id" : {
                "f1" : "yellow",
                "f2" : "small",
                "f3" : "triangle"
            },
            "count" : 4145
        },
        {
            "_id" : {
                "f1" : "red",
                "f2" : "small",
                "f3" : "square"
            },
            "count" : 4062
        }

... and so on

You would then sum up the results client-side, over a drastically reduced number of entries. Assuming the number of unique values for each key is sufficiently small compared to the total number of documents, you could do this final step in a negligible amount of time.



来源:https://stackoverflow.com/questions/16597812/how-do-i-count-multiple-keys-in-the-same-mongodb-aggregation-group-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!