Mongodb distinct on a array field with regex query?

前端 未结 1 508
Happy的楠姐
Happy的楠姐 2021-01-03 03:13

Basically i\'m trying to implement tags functionality on a model.

> db.event.distinct(\"tags\")
[ \"bar\", \"foo\", \"foobar\" ]

Doing a

1条回答
  •  刺人心
    刺人心 (楼主)
    2021-01-03 03:38

    The aggregation framework and not the .distinct() command:

    db.event.aggregate([
        // De-normalize the array content to separate documents
        { "$unwind": "$tags" },
    
        // Filter the de-normalized content to remove non-matches
        { "$match": { "tags": /foo/ } },
    
        // Group the "like" terms as the "key"
        { "$group": {
            "_id": "$tags"
        }}
    ])
    

    You are probably better of using an "anchor" to the beginning of the regex is you mean from the "start" of the string. And also doing this $match before you process $unwind as well:

    db.event.aggregate([
        // Match the possible documents. Always the best approach
        { "$match": { "tags": /^foo/ } },
    
        // De-normalize the array content to separate documents
        { "$unwind": "$tags" },
    
        // Now "filter" the content to actual matches
        { "$match": { "tags": /^foo/ } },
    
        // Group the "like" terms as the "key"
        { "$group": {
            "_id": "$tags"
        }}
    ])
    

    That makes sure you are not processing $unwind on every document in the collection and only those that possibly contain your "matched tags" value before you "filter" to make sure.

    The really "complex" way to somewhat mitigate large arrays with possible matches takes a bit more work, and MongoDB 2.6 or greater:

    db.event.aggregate([
        { "$match": { "tags": /^foo/ } },
        { "$project": {
            "tags": { "$setDifference": [
                { "$map": {
                    "input": "$tags",
                    "as": "el",
                    "in": { "$cond": [
                        { "$eq": [ 
                            { "$substr": [ "$$el", 0, 3 ] },
                            "foo"
                        ]},
                        "$$el",
                        false
                    ]}
                }},
                [false]
            ]}
        }},
        { "$unwind": "$tags" },
        { "$group": { "_id": "$tags" }}
    ])
    

    So $map is a nice "in-line" processor of arrays but it can only go so far. The $setDifference operator negates the false matches, but ultimately you still need to process $unwind to do the remaining $group stage for distinct values overall.

    The advantage here is that arrays are now "reduced" to only the "tags" element that matches. Just don't use this when you want a "count" of the occurrences when there are "multiple distinct" values in the same document. But again, there are other ways to handle that.

    0 讨论(0)
提交回复
热议问题