MongoDB find subdocument and sort the results

一世执手 提交于 2021-01-28 02:56:13

问题


I have a collection in MongoDB with a complex structure and subdocuments. The document have an structure like this:

doc1 = {
    '_id': '12345678',
    'url': "http//myurl/...",
    'nlp':{
        "status": "OK",
        "entities": {
            "0": {
                "type" : "Person",
                "relevance": "0.877245",
                "text" : "Neelie Kroes"
            },
            "1": {
                "type": "Company",
                "relevance": "0.36242",
                "text": "ICANN"
            },
            "2": {
                "type": "Company",
                "relevance": "0.265175",
                "text": "IANA" 
            }
        }
    }
}


doc2 = {
    '_id': '987456321',
    'url': "http//myurl2/...",
    'nlp':{
        "status": "OK",
        "entities": {
            "0": {
                "type": "Company",
                "relevance": "0.96",
                "text": "ICANN"
            },
            "1": {
                "type" : "Person",
                "relevance": "0.36242",
                "text" : "Neelie Kroes"
            },
            "2": {
                "type": "Company",
                "relevance": "0.265175",
                "text": "IANA" 
            }
        }
    }
}

My task is to search for "type" AND "text" inside the subdocument, then sort by "relevance". With the $elemMatch operator I'm able to perform the query:

db.resource.find({
    'nlp.entities': {
        '$elemMatch': {'text': 'Neelie Kroes', 'type': 'Person'}
    }
});

Perfect, now I have to sort all the records with entities of type "Person" and value "Neelie Kroes" by relevance descending.

I tried with a normal "sort", but, as the manual said about the sort() in $elemMatch, the result may not reflect the sort order because the sort() was applied to the elements of the array before the $elemMatch projection.

In fact, the _id:987456321 will be the first (with a relevance of 0.96, but referenced to ICANN).

How can I do, to sort my documents by matched subdocument's relevance?

P.S.: I can't change the document structure.


回答1:


As noted I hope your documents actually do have an array, but if $elemMatch is working for you then they should.

At any rate, you cannot sort by an element in an array using find. But there is a case where you can do this using .aggregate():

db.collection.aggregate([

    // Match the documents that you want, containing the array
    { "$match": {
        "nlp.entities": {
            "$elemMatch": { 
                "text": "Neelie Kroes", 
                "type": "Person"   
            }
        }
    }},

    // Project to "store" the whole document for later, duplicating the array
    { "$project": {
        "_id": {
            "_id": "$_id",
            "url": "$url",
            "nlp": "$nlp"          
        },
        "entities": "$nlp.entities"
    }},

    // Unwind the array to de-normalize
    { "$unwind": "$entities" },

    // Match "only" the relevant entities
    { "$match": {
        "entities.text": "Neelie Kroes", 
        "entities.type": "Person"   
    }},

    // Sort on the relevance
    { "$sort": { "entities.relevance": -1 } },

    // Restore the original document form
    { "$project": {
        "_id": "$_id._id",
        "url": "$_id.url",
        "nlp": "$_id.nlp"
    }}
])

So essentially, after doing the $match condition for documents that contained the relevant match, you then use $project "store" the original document in the _id field and $unwind a "copy" of the "entities" array.

The next $match "filters" the array contents to only those ones that are relevant. Then you apply the $sort to the "matched" documents.

As the "original" document was stored under _id, you use $project to "restore" the structure that the document actually had to begin with.

That is how you "sort" on your matched element of an array.

Note that if you had multiple "matches" within an array for a parent document, then you would have to employ an additional $group stage to get the $max value for the "relevance" field in order to complete your sort.



来源:https://stackoverflow.com/questions/22751210/mongodb-find-subdocument-and-sort-the-results

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!