mapReduce/Aggregation: Group by a value in a nested document

拥有回忆 提交于 2019-12-12 02:14:28

问题


imagine I have a collection like this:

{
  "_id": "10280",
  "city": "NEW YORK",
  "state": "NY",
  "departments": [
             {"departmentType":"01",
              "departmentHead":"Peter"},
             {"departmentType":"02",
              "departmentHead":"John"}
  ]
},
{
  "_id": "10281",
  "city": "LOS ANGELES",
  "state": "CA",
  "departments": [
             {"departmentType":"02",
              "departmentHead":"Joan"},
             {"departmentType":"03",
              "departmentHead":"Mary"}
  ]
},
{
  "_id": "10284",
  "city": "MIAMI",
  "state": "FL",
  "department": [
  "departments": [
             {"departmentType":"01",
              "departmentHead":"George"},
             {"departmentType":"02",
              "departmentHead":"Harry"}
  ]
}

I'd like to get a count per departmentType, something like:

[{"departmentType":"01", "dCount":2},
 {"departmentType":"02", "dCount":3},
 {"departmentType":"03", "dCount":1}
]

For this, I've tried almost everything already, but all examples I find online are easier ones where the group by is done over a field at the root level of the document. Instead, here I'm trying to group by departmentType, and that seems to break everything I found so far.

Any ideas on how to do this using Mongoose's aggregation implementation or mapreduce?

Ideally, I'd like to exclude all departmentTypes with count <= 1 and sort the results by departmentType.

Thank you all in advance!


回答1:


You need to $unwind the departments array which will create a document for each entry in the array so you can aggregate them in the pipeline.

Unfortunately, you can't pre-filter departmentTypes <= 1 because $size will only take a an exact value, but you can filter it out of the results. It's not great, but it works. This example pre-filters only those records with EXACTLY 2 departments, but it's for demo only, you probably want to drop the first $match because we filter out <=1 with the second $match on the results later on;

db.runCommand({
    aggregate: "so",
    pipeline: [
        {   // filter out only records with 2 departments
            $match: {
                departments: { $size: 2 }
            }
        },
        // unwind - create a doc for each department in the array
        { $unwind: "$departments" },
        {   // aggregate sum of departments by type
            $group: {
                _id: "$departments.departmentType",
                count: { $sum: 1 },
            }
        },
        {   // filter out departments with <=1
            $match: {
                count: { $gt: 1 },
            }
        },
        {   // rename fields as per example
            $project: {
                _id: 0,
                departmentType: "$_id",
                dCount: "$count",
            }
        }
    ]
});

Note that I've also assumed that your previous json sample has a typo, and "department" doesn't actually exist. This code will work assuming all the documents have the same schema as the first two.

Feel free to drop the first $match, and the last $project if you're not bothered about the actual field names you get.



来源:https://stackoverflow.com/questions/12753440/mapreduce-aggregation-group-by-a-value-in-a-nested-document

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!