问题
We use MongoDB to store time-series sensor data similar to the schema designed shown in https://www.mongodb.com/blog/post/schema-design-for-time-series-data-in-mongodb
We do get good performance on data queries through time periods. Explanation on the our schema design: "v" is the parent key of sensor readings, the time is converted into nested array using Minutes and Seconds. We use "m"(Minute) as sub-parent key, then "s"(Second) as sub-key of minute reading. The sensor readings are located at the "s" level with field1, field2, ...,field10 as the sensor data values.
Now we are trying to implement some data analysis facilities and looking to query the data through sensor data reading values. Is there an efficient way of querying from data without using nested for loop in the query?
for example:
- Items that have sensor reading: "field1">2
- Items that have sensor reading: "field1">2 and "field3">5
Thanks a million.
The records look like the examples below.
{
"_id": ObjectId("5a5dd49f74bbaefd1ac89fc8"),
"c_id": "1017",
"c_id_s": NumberInt(1017),
"c_t": NumberInt(1516096800),
"type": "hour",
"v": {
"m1": {
"s54": {
"field1": 7.373158,
"entry_id": NumberInt(4635),
"field3": 0.19,
"field2": NumberInt(88)
}
},
"m31": {
"s54": {
"field1": 5.981918,
"entry_id": NumberInt(4637),
"field3": 0.04
},
"s55": {
"field2": NumberInt(89),
"entry_id": NumberInt(4639),
"field5": NumberInt(-67)
}
}
},
"entry_id": NumberInt(4639)
},
{
"_id": ObjectId("5a5dd1a174bbaefd1ac89fc1"),
"c_id": "1024",
"c_id_s": NumberInt(1024),
"c_t": NumberInt(1516096800),
"type": "hour",
"v": {
"m3": {
"s22": {
"field3": 210.479996,
"entry_id": NumberInt(30297)
},
"s23": {
"field1": 3.271534,
"entry_id": NumberInt(30300),
"field8": 7.1875,
"field2": NumberInt(94)
}
},
"m8": {
"s23": {
"field3": 150.639999,
"entry_id": NumberInt(30304),
"field1": 2.948425,
"field8": 7.125,
"field2": NumberInt(94)
}
},
"m13": {
"s23": {
"field3": 99.799995,
"entry_id": NumberInt(30308),
"field1": 2.849621,
"field8": 7.0625,
"field2": NumberInt(95)
}
},
"m18": {
"s23": {
"field3": 59.099998,
"entry_id": NumberInt(30312),
"field1": 2.681393,
"field8": 6.9375,
"field2": NumberInt(95)
}
},
"m19": {
"s8": {
"field5": NumberInt(-87),
"entry_id": NumberInt(30313)
}
}
},
"entry_id": NumberInt(30313)
}
回答1:
Map reduce allows you to process named keys but aggregation is the way to go for efficient queries.
You have to model the data as array of embedded documents for aggregation framework.
I've provided you two options. You can test them out for your dataset and see which one works better for you.
Something like
"v":[
{
"minute":1,
"seconds":[
{
"second":54,
"data":{
"field1":7.373158,
"entry_id":4635,
"field3":0.19,
"field2":88
}
}
]
},
{
"minute":2,
"seconds":...
}
]
Now you can easily query for items that have sensor reading: "field1">2.
db.col.aggregate(
[{"$match":{"v.seconds.data.field1":{"$gt":2}}},
{"$unwind":"$v"},
{"$match":{"v.seconds.data.field1":{"$gt":2}}},
{"$unwind":"$v.seconds"},
{"$match":{"v.seconds.data.field1":{"$gt":2}}},
{"$project":{"data":"$v.seconds.data"}}]
)
Alternatively, You can split the documents by minute. Something like
"v":[
{
"second":1,
"data":{
"field1":7.373158,
"entry_id":4635,
"field3":0.19,
"field2":88
}
},
{
"second":2,
"data":...
}
]
You can now query like ( with index on v.data.field1 )
db.col.aggregate(
[{"$match":{"v.data.field1":{"$gt":2}}},
{"$unwind":"$v"},
{"$match":{"v.data.field1":{"$gt":2}}},
{"$project":{"data":"$v.data"}}]
)
You can query items that have sensor reading: "field1">2 and "field3">5
Using first structure
db.col.aggregate(
[{"$match":{"v":{"$elemMatch":{"seconds": {$elemMatch:{"field1":{$gt":2},"field3":{$gt":5}}}}}}},
{"$unwind":"$v"},
{"$match":{"v.seconds": {$elemMatch:{"field1":{$gt":2},"field3":{$gt":5}}}}},
{"$unwind":"$v.seconds"},
{"$project":{"data":"$v.seconds.data"}}]
)
Using second structure
db.col.aggregate(
[{"$match":{"v.data":{$elemMatch:{"field1":{$gt":2},"field3":{$gt":5}}}}},
{"$unwind":"$v"},
{"$match":{"v.data.field1":{"$gt":2},"v.data.field3":{"$gt":5} }},
{"$project":{"data":"$v.data"}}]
)
Mongo Update 3.6
$match
with $expr
which accepts aggregation expression.
$gt > 0
- aggregation expression to check where the sum of all matching seconds criteria in a minute is greater than 0
$objectToArray
to convert the named keys into key value pair followed by $filter
seconds on input criteria and output no of matching seconds record.
db.testcol.aggregate(
{"$match":{
"$expr":{
"$gt":[
{"$sum":{
"$map":{
"input":{"$objectToArray":"$v"},
"as":"secondsofminute",
"in":{
"$size":{
"$filter":{
"input":{"$objectToArray":"$$secondsofminute.v"},
"as":"seconds",
"cond":{"$gt":["$$seconds.v.field2",2]}
}
}
}
}
}},
0]
}
}})
Mongo Update 3.4 - Replace $expr
with $redact
db.col.aggregate(
{"$redact":{
"$cond":{
"if":{
"$gt":[
{"$sum":{
"$map":{
"input":{"$objectToArray":"$v"},
"as":"secondsofminute",
"in":{
"$size":{
"$filter":{
"input":{"$objectToArray":"$$secondsofminute.v"},
"as":"seconds",
"cond":{"$gt":["$$seconds.v.field2",2]}
}
}
}
}
}},
0]
},
"then":"$$KEEP",
"else":"$$PRUNE"
}
}})
来源:https://stackoverflow.com/questions/48279637/mongo-query-nested-field-values-with-two-level-unknown-parent-keys