Mongo Query Nested Field Values with two-level unknown parent keys

随声附和 提交于 2019-12-10 18:12:32

问题


We use MongoDB to store time-series sensor data similar to the schema designed shown in https://www.mongodb.com/blog/post/schema-design-for-time-series-data-in-mongodb

We do get good performance on data queries through time periods. Explanation on the our schema design: "v" is the parent key of sensor readings, the time is converted into nested array using Minutes and Seconds. We use "m"(Minute) as sub-parent key, then "s"(Second) as sub-key of minute reading. The sensor readings are located at the "s" level with field1, field2, ...,field10 as the sensor data values.

Now we are trying to implement some data analysis facilities and looking to query the data through sensor data reading values. Is there an efficient way of querying from data without using nested for loop in the query?

for example:

  1. Items that have sensor reading: "field1">2
  2. Items that have sensor reading: "field1">2 and "field3">5

Thanks a million.

The records look like the examples below.

{
   "_id": ObjectId("5a5dd49f74bbaefd1ac89fc8"),
   "c_id": "1017",
   "c_id_s": NumberInt(1017),
   "c_t": NumberInt(1516096800),
   "type": "hour",
   "v": {
     "m1": {
       "s54": {
         "field1": 7.373158,
         "entry_id": NumberInt(4635),
         "field3": 0.19,
         "field2": NumberInt(88) 
      } 
    },
     "m31": {
       "s54": {
         "field1": 5.981918,
         "entry_id": NumberInt(4637),
         "field3": 0.04 
      },
       "s55": {
         "field2": NumberInt(89),
         "entry_id": NumberInt(4639),
         "field5": NumberInt(-67) 
      } 
    } 
  },
   "entry_id": NumberInt(4639) 
}, 
{
   "_id": ObjectId("5a5dd1a174bbaefd1ac89fc1"),
   "c_id": "1024",
   "c_id_s": NumberInt(1024),
   "c_t": NumberInt(1516096800),
   "type": "hour",
   "v": {
     "m3": {
       "s22": {
         "field3": 210.479996,
         "entry_id": NumberInt(30297) 
      },
       "s23": {
         "field1": 3.271534,
         "entry_id": NumberInt(30300),
         "field8": 7.1875,
         "field2": NumberInt(94) 
      } 
    },
     "m8": {
       "s23": {
         "field3": 150.639999,
         "entry_id": NumberInt(30304),
         "field1": 2.948425,
         "field8": 7.125,
         "field2": NumberInt(94) 
      } 
    },
     "m13": {
       "s23": {
         "field3": 99.799995,
         "entry_id": NumberInt(30308),
         "field1": 2.849621,
         "field8": 7.0625,
         "field2": NumberInt(95) 
      } 
    },
     "m18": {
       "s23": {
         "field3": 59.099998,
         "entry_id": NumberInt(30312),
         "field1": 2.681393,
         "field8": 6.9375,
         "field2": NumberInt(95) 
      } 
    },
     "m19": {
       "s8": {
         "field5": NumberInt(-87),
         "entry_id": NumberInt(30313) 
      } 
    } 
  },
   "entry_id": NumberInt(30313) 
}

回答1:


Map reduce allows you to process named keys but aggregation is the way to go for efficient queries.

You have to model the data as array of embedded documents for aggregation framework.

I've provided you two options. You can test them out for your dataset and see which one works better for you.

Something like

"v":[
  {
    "minute":1,
    "seconds":[
      {
        "second":54,
        "data":{
         "field1":7.373158,
         "entry_id":4635,
         "field3":0.19,
         "field2":88
       }
      }
    ]
  },
  {
    "minute":2,
    "seconds":...
  }
]

Now you can easily query for items that have sensor reading: "field1">2.

db.col.aggregate(
  [{"$match":{"v.seconds.data.field1":{"$gt":2}}},
   {"$unwind":"$v"}, 
   {"$match":{"v.seconds.data.field1":{"$gt":2}}},
   {"$unwind":"$v.seconds"}, 
   {"$match":{"v.seconds.data.field1":{"$gt":2}}},
   {"$project":{"data":"$v.seconds.data"}}]
)

Alternatively, You can split the documents by minute. Something like

"v":[
  {
    "second":1,
    "data":{
       "field1":7.373158,
       "entry_id":4635,
       "field3":0.19,
       "field2":88
     }
  },
  {
     "second":2,
     "data":...
  }
]

You can now query like ( with index on v.data.field1 )

db.col.aggregate(
  [{"$match":{"v.data.field1":{"$gt":2}}},
   {"$unwind":"$v"}, 
   {"$match":{"v.data.field1":{"$gt":2}}},
   {"$project":{"data":"$v.data"}}]
)

You can query items that have sensor reading: "field1">2 and "field3">5

Using first structure

db.col.aggregate(
  [{"$match":{"v":{"$elemMatch":{"seconds": {$elemMatch:{"field1":{$gt":2},"field3":{$gt":5}}}}}}},
  {"$unwind":"$v"}, 
    {"$match":{"v.seconds": {$elemMatch:{"field1":{$gt":2},"field3":{$gt":5}}}}},
  {"$unwind":"$v.seconds"}, 
  {"$project":{"data":"$v.seconds.data"}}]
)

Using second structure

db.col.aggregate(
  [{"$match":{"v.data":{$elemMatch:{"field1":{$gt":2},"field3":{$gt":5}}}}},
  {"$unwind":"$v"}, 
  {"$match":{"v.data.field1":{"$gt":2},"v.data.field3":{"$gt":5} }},
  {"$project":{"data":"$v.data"}}]
)

Mongo Update 3.6

$match with $expr which accepts aggregation expression.

$gt > 0 - aggregation expression to check where the sum of all matching seconds criteria in a minute is greater than 0

$objectToArray to convert the named keys into key value pair followed by $filter seconds on input criteria and output no of matching seconds record.

db.testcol.aggregate(
{"$match":{
  "$expr":{
    "$gt":[
      {"$sum":{
        "$map":{
          "input":{"$objectToArray":"$v"},
          "as":"secondsofminute",
          "in":{
            "$size":{
              "$filter":{
                "input":{"$objectToArray":"$$secondsofminute.v"},
                "as":"seconds",
                "cond":{"$gt":["$$seconds.v.field2",2]}
              }
            }
          }
        }
      }},
    0]
  }
}})

Mongo Update 3.4 - Replace $expr with $redact

db.col.aggregate(
 {"$redact":{
  "$cond":{
    "if":{
      "$gt":[
        {"$sum":{
          "$map":{
            "input":{"$objectToArray":"$v"},
            "as":"secondsofminute",
            "in":{
              "$size":{
                "$filter":{
                  "input":{"$objectToArray":"$$secondsofminute.v"},
                  "as":"seconds",
                  "cond":{"$gt":["$$seconds.v.field2",2]}
                }
              }
            }
          }
        }},
        0]
    },
   "then":"$$KEEP",
   "else":"$$PRUNE"
  }
}})


来源:https://stackoverflow.com/questions/48279637/mongo-query-nested-field-values-with-two-level-unknown-parent-keys

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!