MongoDB nested object aggregation counting

匿名 (未验证) 提交于 2019-12-03 02:52:02

问题:

I have a highly nested mongoDB set of objects and I want to count the number of subdocuments that match a given condition Edit: (in each document). For example:

{"_id":{"chr":"20","pos":"14371","ref":"A","alt":"G"}, "studies":[     {         "study_id":"Study1",         "samples":[             {                 "sample_id":"NA00001",                 "formatdata":[                     {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}                 ]             },             {                 "sample_id":"NA00002",                 "formatdata":[                     {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}                 ]             }         ]     } ] } {"_id":{"chr":"20","pos":"14372","ref":"T","alt":"AA"}, "studies":[     {         "study_id":"Study3",         "samples":[             {                 "sample_id":"SAMPLE1",                 "formatdata":[                     {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}                 ]             },             {                 "sample_id":"SAMPLE2",                 "formatdata":[                     {"GT":"1|0","GQ":48,"DP":8,"HQ":[51,51]}                 ]             }         ]     } ] } {"_id":{"chr":"20","pos":"14373","ref":"C","alt":"A"}, "studies":[     {         "study_id":"Study3",         "samples":[             {                 "sample_id":"SAMPLE3",                 "formatdata":[                     {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}                 ]             },             {                 "sample_id":"SAMPLE7",                 "formatdata":[                     {"GT":"0|0","GQ":48,"DP":8,"HQ":[51,51]}                 ]             }         ]     } ] } 

I want to know how many subdocuments contain GT:"1|0", which in this case would be 1 in the first document, and two in the second, and 0 in the 3rd. I've tried the unwind and aggregate functions but I'm obviously not doing something correct. When I try to count the sub documents by the "GT" field, mongo complains:

db.collection.aggregate([{$group: {"$studies.samples.formatdata.GT":1,_id:0}}]) 

since my group's names cannot contain ".", yet if I leave them out:

db.collection.aggregate([{$group: {"$GT":1,_id:0}}]) 

it complains because "$GT cannot be an operator name"

Any ideas?

回答1:

You need to process $unwind when working with arrays, and you need to do this three times:

 db.collection.aggregate([       // Un-wind the array's to access filtering       { "$unwind": "$studies" },      { "$unwind": "$studies.samples" },      { "$unwind": "$studies.samples.formdata" },       // Group results to obtain the matched count per key      { "$group": {          "_id": "$studies.samples.formdata.GT",          "count": { "$sum": 1 }      }}  ]) 

Ideally you want to filter your input. Possibly do this with a $match both before and after $unwind is processed and using a $regex to match documents where the data at point begins with a "1".

 db.collection.aggregate([       // Match first to exclude documents where this is not present in any array member      { "$match": { "studies.samples.formdata.GT": /^1/ } },       // Un-wind the array's to access filtering       { "$unwind": "$studies" },      { "$unwind": "$studies.samples" },      { "$unwind": "$studies.samples.formdata" },       // Match to filter      { "$match": { "studies.samples.formdata.GT": /^1/ } },       // Group results to obtain the matched count per key      { "$group": {          "_id": {               "_id": "$_id",               "key": "$studies.samples.formdata.GT"          },          "count": { "$sum": 1 }      }}  ]) 

Note that in all cases the "dollar $" prefixed entries are the "variables" referring to properties of the document. These are "values" to use an input on the right side. The left side "keys" must be specified as a plain string key. No variable can be used to name a key.



易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!