Mongodb distinct on a array field with regex query?

纵然是瞬间 提交于 2019-12-30 05:03:25

问题


Basically i'm trying to implement tags functionality on a model.

> db.event.distinct("tags")
[ "bar", "foo", "foobar" ]

Doing a simple distinct query retrieves me all distinct tags. However how would i go about getting all distinct tags that match a certain query? Say for example i wanted to get all tags matching foo and then expecting to get ["foo","foobar"] as a result?

The following queries is my failed attempts of achieving this:

> db.event.distinct("tags",/foo/)
[ "bar", "foo", "foobar" ]

> db.event.distinct("tags",{tags: {$regex: 'foo'}})
[ "bar", "foo", "foobar" ]

回答1:


The aggregation framework and not the .distinct() command:

db.event.aggregate([
    // De-normalize the array content to separate documents
    { "$unwind": "$tags" },

    // Filter the de-normalized content to remove non-matches
    { "$match": { "tags": /foo/ } },

    // Group the "like" terms as the "key"
    { "$group": {
        "_id": "$tags"
    }}
])

You are probably better of using an "anchor" to the beginning of the regex is you mean from the "start" of the string. And also doing this $match before you process $unwind as well:

db.event.aggregate([
    // Match the possible documents. Always the best approach
    { "$match": { "tags": /^foo/ } },

    // De-normalize the array content to separate documents
    { "$unwind": "$tags" },

    // Now "filter" the content to actual matches
    { "$match": { "tags": /^foo/ } },

    // Group the "like" terms as the "key"
    { "$group": {
        "_id": "$tags"
    }}
])

That makes sure you are not processing $unwind on every document in the collection and only those that possibly contain your "matched tags" value before you "filter" to make sure.

The really "complex" way to somewhat mitigate large arrays with possible matches takes a bit more work, and MongoDB 2.6 or greater:

db.event.aggregate([
    { "$match": { "tags": /^foo/ } },
    { "$project": {
        "tags": { "$setDifference": [
            { "$map": {
                "input": "$tags",
                "as": "el",
                "in": { "$cond": [
                    { "$eq": [ 
                        { "$substr": [ "$$el", 0, 3 ] },
                        "foo"
                    ]},
                    "$$el",
                    false
                ]}
            }},
            [false]
        ]}
    }},
    { "$unwind": "$tags" },
    { "$group": { "_id": "$tags" }}
])

So $map is a nice "in-line" processor of arrays but it can only go so far. The $setDifference operator negates the false matches, but ultimately you still need to process $unwind to do the remaining $group stage for distinct values overall.

The advantage here is that arrays are now "reduced" to only the "tags" element that matches. Just don't use this when you want a "count" of the occurrences when there are "multiple distinct" values in the same document. But again, there are other ways to handle that.



来源:https://stackoverflow.com/questions/27980100/mongodb-distinct-on-a-array-field-with-regex-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!