Transform keymap into vector using MongoDB framework

问题

I have documents like this one at collection x at MongoDB:

{
    "_id" : ...
    "attrKeys": [ "A1", "A2" ],
    "attrs" : {
        "A1" : {
            "type" : "T1",
            "value" : "13"
        },
        "A2" : {
            "type" : "T2",
            "value" : "14"
        }
    }
}

The A1 and A2 elements above are just examples: the attrs field may hold any number of keys of any name. The key names in attrs are stored in the attrNames field.

I would like to use the MongoDB aggregation framework to transform that document into one like this:

{
    "_id" : ...
    "attrs" : [
        {   
            "key": "A1",
            "type" : "T1",
            "value" : "13"
        },
        {   
            "key": "A2",
            "type" : "T2",
            "value" : "14"
        }
    ]
}

That is, to become attrs into an array, which elements are the same that the key values "passing" the key into a new field inside each array element of name key.

It is possible use the aggregation framework for suck transformation? I tend to think that $project operator could be used, but I haven't figured out how.

回答1:

As @Philipp rightly mentioned in his comments

Having unknown keys is a dangerous anti-pattern in MongoDB

However, if you knew beforehand what the keys are then you could use the aggregation operators $literal, $addToSet and $setUnion to get the desired result. The aggregation pipeline would be like:

db.collection.aggregate([
    {
        "$project": {

            "attrs.A1.key": { "$literal": "A1" },
            "attrs.A1.type": "$attrs.A1.type",
            "attrs.A1.value": "$attrs.A1.value",
            "attrs.A2.key": { "$literal": "A2" },
            "attrs.A2.type": "$attrs.A2.type",
            "attrs.A2.value": "$attrs.A2.value"
        }
    },
    {
        "$group": {
            "_id": "$_id",
            "A1": { "$addToSet": "$attrs.A1" },
            "A2": { "$addToSet": "$attrs.A2" }
        }
    },
    {
        "$project": {
            "attrs": {
                "$setUnion": [ "$A1", "$A2" ]
            }
        }
    }
])

Result:

/* 0 */
{
    "result" : [ 
        {
            "_id" : ObjectId("55361320180e849972938fea"),
            "attrs" : [ 
                {
                    "type" : "T1",
                    "value" : "13",
                    "key" : "A1"
                }, 
                {
                    "type" : "T2",
                    "value" : "14",
                    "key" : "A2"
                }
            ]
        }
    ],
    "ok" : 1
}

回答2:

The aggregation framework is not how you handle the transformation here. You might have been looking to the $out operator to be of some help when re-writing your collection, but the aggregation framework cannot do what you are asking.

Basically the aggregation framework lacks the means to access "keys" dynamically by using a "data point" in any way. You can process data like you have with mapReduce, but it is generally not as efficient as using the aggregation framework and mostly why you seem to be here in the first place, since someone pointed out the revised structure is better.

Also, trying to use mapReduce as a way to "re-shape" your collection for storage is generally not a good idea. MapReduce output is essentially "always" "key/value", which means the output you get is always going to be contained under an mandatory "value" field.

This really means changing the contents of the collection, and the only way you can really do that while using the values present in you document is by "reading" the document content and then "writing" back.

The looping nature of this is best handled using the "Bulk" operations API methods

db.collection.intializeOrderedBukOp(),
var bulk = db.collection.intializeOrderedBukOp(),
    count = 0;

db.collection.find({ "attrKeys": { "$exists": true }}).forEach(function(doc) {
   // Re-map attrs
   var attrs = doc.attrKeys.map(function(key) {
       return {
           "key": key,
           "type": doc.attrs[key].type,
           "value": parseInt(doc.attrs[key].value)
       };
   });

   // Queue update operation
   bulk.find({ "_id": doc._id, "attrKeys": { "$exists": true } })
       .updateOne({ 
           "$set": { "attrs": attrs },
           "$unset": { "attrKeys": 1 }
       });
   count++;

   // Execute every 1000
   if ( count % 1000 == 0 ) {
       bulk.execute();
       bulk = db.collection.intializeOrderedBukOp();
   }
});

// Drain any queued remaining
if ( count % 1000 != 0 )
    bulk.execute();

Once you have updated the collection content ( and please note that your "value" fields there have also been changed from "string" to "integer" format ) then you can do useful aggregation operations on your new structure, such as:

db.collection.aggregate([
    { "$unwind": "$attrs" },
    { "$group": {
        "_id": null,
       "avgValue": { "$avg": "$attrs.value" }
    }}
])

来源：https://stackoverflow.com/questions/29767598/transform-keymap-into-vector-using-mongodb-framework

标签

mongodb

aggregation-framework