I have a time series that grows and is (potentially) revised through time:
on "2013-01-01": First version of the data
"2013-01-01" 10
on "2013-01-02": Data of the 1st of Jan is revised from 10 to 11
"2013-01-01" 11
on "2013-02-01": First version of the data of the 1st of Feb
"2013-01-01" 11
"2013-02-01" 20
on "2013-02-02": Data of the 1st of Feb is revised from 20 to 21
"2013-01-01" 11
"2013-02-01" 21
most frequent queries:
query1: get the most recent version of all dates
"2013-01-01" 11
"2013-02-01" 21
query2: get the time series as it was known at a certain date:
For instance, querying with "2013-02-01", I need to get
"2013-01-01" 11
"2013-02-01" 20
Note that query1 is a the same as query2 but with date = current date
I need help to structure my documents, and as I come from a relational background, I am not sure about the implications of my structure. I have basically identified 2 possible structure, and would be happy to have some feedbacks, or suggestions of other structure.
OPTION A: Each revision in a separate document
{
"id":"1",
"date":"2013-01-01",
"version_date":"2013-01-01",
"value":10
}
{
"id":"1",
"date":"2013-01-01",
"version_date":"2013-01-02",
"value":11
}
{
"id":"1",
"date":"2013-02-01",
"version_date":"2013-02-01",
"value":20
}
{
"id":"1",
"date":"2013-02-01",
"version_date":"2013-02-02",
"value":21
}
OPTION B: One document contains all the revisions of one date
{
"id":"1",
"date":"2013-01-01",
"values" : [
{ "version_date":"2013-01-01",
"value":10
},
{
"version_date":"2013-01-02",
"value":11
}
}
{
"id":"1",
"date":"2013-02-01",
"values" : [
{ "version_date":"2013-02-01",
"value":20
},
{
"version_date":"2013-02-02",
"value":21
}
}
In option B, I am also concerned by the fact that it might be a bit more difficult to perform the update query as the document has a growing part, which i am not sure is very well supported by / optimised for mongodb
EDIT: I am also considering option C to speed up query1: (although it might slow down a bit the writing)
{
"id":"1",
"date":"2013-01-01",
"values" : [
{ "version_date":"2013-01-01",
"value":10
},
{
"version_date":"2013-01-02",
"value":11
}
"last_value":11
}
{
"id":"1",
"date":"2013-02-01",
"values" : [
{ "version_date":"2013-02-01",
"value":20
},
{
"version_date":"2013-02-02",
"value":21
}
"last_value":21
}
As with all questions like this, you are the only person who can answer this. If you have your data - try both way do some benchmarking on real data with real queries and compare what is better. If you do not have data - try to simulate it.
Keep in mind that with option B and C you have to be aware of 16 Mb limit per document. So if you have a lot of versions - you might reach the limit (but you have to understand that a there should be too many versions to reach 16Mb). Also keep in mind that updating such documents can and up with many moves on the disk.
Option B and C would be nice if you would need to select all revisions of a particular document at once, but I have not found this in your most often queries. Keep in mind that with right indexes you can achieve this as well with option A.
There's actually a very recent blog post on the official page covering this topic: http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb Take a look at that and ask any additional questions if required.
Considering the above mentioned Options, and your requirements, it would be best to create your structure based on date
, like you mentioned in Option-B.Also it would be nice if your date
is indexed. Some scenarios (easy reads,updates) that show why this seems to be the proper optimized solution are:
来源:https://stackoverflow.com/questions/19701685/structure-of-documents-for-versioning-of-a-time-series-on-mongodb