How to get the maximum value of a field for each group with the array of corresponding documents?

佐手、 提交于 2019-12-12 08:35:04

问题


I have a collection like

{
    "_id" : ObjectId("5738cb363bb56eb8f76c2ba8"),
    "records" : [
        {
            "Name" : "Joe",
            "Salary" : 70000,
            "Department" : "IT"
        }
    ]
},
{
    "_id" : ObjectId("5738cb363bb56eb8f76c2ba9"),
    "records" : [
        {
            "Name" : "Henry",
            "Salary" : 80000,
            "Department" : "Sales"
        },
        {
            "Name" : "Jake",
            "Salary" : 40000,
            "Department" : "Sales"
        }
    ]
},
{
    "_id" : ObjectId("5738cb363bb56eb8f76c2baa"),
    "records" : [
        {
            "Name" : "Sam",
            "Salary" : 90000,
            "Department" : "IT"
        },
        {
            "Name" : "Tom",
            "Salary" : 50000,
            "Department" : "Sales"
        }
    ]
}

I want to have the results with the highest salary by each department

{"Name": "Sam", "Salary": 90000, "Department": "IT"}
{"Name": "Henry", "Salary": 80000, "Department": "Sales"}

I could get the highest salary. But I could not get the corresponding employee names.

db.HR.aggregate([

    { "$unwind": "$records" },
    { "$group": 
        {
            "_id": "$records.Department",
            "max_salary": { "$max": "$records.Salary" }
        }
    }   
])

Could somebody help me?


回答1:


You need to $sort your document after $unwind and use the $first operator in the $group stage. You can also use the $last operator in which case you will need to sort your documents in ascending order

db.HR.aggregate([
    { '$unwind': '$records' }, 
    { '$sort': { 'records.Salary': -1 } }, 
    { '$group': { 
        '_id': '$records.Department', 
        'Name': { '$first': '$records.Name' } , 
        'Salary': { '$first': '$records.Salary' }
    }}
])

which produces:

{ "_id" : "Sales", "Name" : "Henry", "Salary" : 80000 }
{ "_id" : "IT", "Name" : "Sam", "Salary" : 90000 }

To return the maximum salary and employees list for each department you need to use the $max in your group stage to return the maximum "Salary" for each group then use $push accumulator operator to return a list of "Name" and "Salary" for all employees for each group. From there you need to use the $map operator in your $project stage to return a list of names alongside the maximum salary. Of course the $cond here is used to compare each employee salary to the maximum value. The $setDifference does his work which is filter out all false and is fine as long as the data being filtered is "unique". In this case it "should" be fine, but if any two results contained the same "name" then it would skew results by considering the two to be one.

db.HR.aggregate([
    { '$unwind': '$records' },
    { '$group': {
        '_id': '$records.Department', 
        'maxSalary': { '$max': '$records.Salary' }, 
        'persons': { 
            '$push': {
                'Name': '$records.Name', 
                'Salary': '$records.Salary' 
            }
        }
    }}, 
    { '$project': { 
        'maxSalary': 1, 
        'persons': { 
            '$setDifference': [
                { '$map': {
                    'input': '$persons', 
                    'as': 'person', 
                    'in': {
                        '$cond': [
                            { '$eq': [ '$$person.Salary', '$maxSalary' ] }, 
                            '$$person.Name', 
                            false
                        ]
                    }
                }}, 
                [false]
            ]
        }
    }}
])

which yields:

{ "_id" : "Sales", "maxSalary" : 80000, "persons" : [ "Henry" ] }
{ "_id" : "IT", "maxSalary" : 90000, "persons" : [ "Sam" ] }



回答2:


Its not the most intuitive thing, but instead of $max you should be using $sort and $first:

{ "$unwind": "$records" },
{ "$sort": { "$records.Salary": -1},
{ "$group" : 
    {
        "_id": "$records.Department",
        "max_salary": { "$first": "$records.Salary" },
        "name": {$first: "$records.Name"}
    }
}

Alternatively, I think this is doable using the $$ROOT operator (fair warning: I've not actually tried this) -

{ "$unwind": "$records" },
{ "$group": 
        {
            "_id": "$records.Department",
            "max_salary": { "$max": "$records.Salary" }
            "name" : "$$ROOT.records.Name"
        }
    }
}



回答3:


Another possible solution:

db.HR.aggregate([
    {"$unwind": "$records"},
    {"$group":{
        "_id": "$records.Department",
        "arr": {"$push": {"Name":"$records.Name", "Salary":"$records.Salary"}},
        "maxSalary": {"$max":"$records.Salary"}
    }},
    {"$unwind": "$arr"},
    {"$project": {
        "_id":1,
        "arr":1,
        "isMax":{"$eq":["$arr.Salary", "$maxSalary"]}
    }},
    {"$match":{
        "isMax":true
    }}
])

This solution takes advantage of the $eq operator to compare two fields in the $project stage.

Test case:

db.HR.insert({"records": [{"Name": "Joe", "Salary": 70000, "Department": "IT"}]})
db.HR.insert({"records": [{"Name": "Henry", "Salary": 80000, "Department": "Sales"}, {"Name": "Jake", "Salary": 40000, "Department": "Sales"}, {"Name": "Santa", "Salary": 90000, "Department": "IT"}]})
db.HR.insert({"records": [{"Name": "Sam", "Salary": 90000, "Department": "IT"}, {"Name": "Tom", "Salary": 50000, "Department": "Sales"}]})

Result:

{ "_id" : "Sales", "arr" : { "Name" : "Henry", "Salary" : 80000 }, "isMax" : true }
{ "_id" : "IT", "arr" : { "Name" : "Santa", "Salary" : 90000 }, "isMax" : true }
{ "_id" : "IT", "arr" : { "Name" : "Sam", "Salary" : 90000 }, "isMax" : true }


来源:https://stackoverflow.com/questions/34007949/how-to-get-the-maximum-value-of-a-field-for-each-group-with-the-array-of-corresp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!