MongoDB Correct Schema for aggregated data

后端 未结 4 1181
旧时难觅i
旧时难觅i 2020-12-22 05:31

I have a big collection that holds lots of stats, since I want to generate reports, I am running a daily cron which aggregates data from the main collection to a smaller one

相关标签:
4条回答
  • 2020-12-22 05:41

    I'm working with time series and MongoDB, and I use a schema based on this.

    {
      timestamp_minute: ISODate("2013-10-10T23:06:00.000Z"),
      type: “memory_used”,
      values: {
        0: 999999,
        …  
        37: 1000000,
        38: 1500000,
        … 
        59: 2000000
      }
    }
    

    It's interesting to watch this video too.

    0 讨论(0)
  • 2020-12-22 05:48

    Would recommend further restructuring the schema in Method 2 to follow this schema:

    /* 0 */
    {
        "_id" : ObjectId("5577fd322ab13c8cacdd0e70"),
        "order_id" : "VjprK",
        "user_id" : "777",
        "data" : [ 
            {
                "order_date" : ISODate("2015-04-18T08:57:42.514Z"),
                "amount" : 100
            }, 
            {
                "order_date" : ISODate("2015-04-19T08:57:42.514Z"),
                "amount" : 200
            }, 
            {
                "order_date" : ISODate("2015-04-20T08:57:42.514Z"),
                "amount" : 300
            }, 
            {
                "order_date" : ISODate("2015-04-21T08:57:42.514Z"),
                "amount" : 400
            }
        ]
    }
    

    which you can then aggregate with a given date range, say from 2015-04-18 to 2015-04-19. Consider the following pipeline:

    var start = new Date(2015, 3, 18),
        end = new Date(2015, 3, 20);
    
    db.orders.aggregate([
        {
            "$match": {
                "user_id": "777",
                "data.order_date": {
                    "$gte": start,
                    "$lt": end
                }
            }
        },
        {
            "$unwind": "$data"
        },
        {
            "$match": {
                "data.order_date": {
                    "$gte": start,
                    "$lt": end
                }
            }
        },
        {
            "$group": {
                "_id": "$user_id",
                "total": {
                    "$sum": "$data.amount"
                }
            }
        }    
    ])
    

    Sample Output

    /* 0 */
    {
        "result" : [ 
            {
                "_id" : "777",
                "total" : 300
            }
        ],
        "ok" : 1
    }
    
    0 讨论(0)
  • 2020-12-22 05:55

    This schema simplify a lot range queries and array is a common way of store these data series.

    {
        'order_id': 'VjprK',
        'user_id': '777',
        'data': [
            {
                date: MongoDate(2015-04-18),
                value: 100
            },
            {
                date: MongoDate(2015-04-19),
                value: 200
            }
            ...
        ]
    }
    
    0 讨论(0)
  • 2020-12-22 06:06

    Personally, seems it looks like these are "delivery dates" for parts of an order I would do this:

    {
        'order_id': 'LaOPX',
        'user_id': '777',
        'parts': [
            { "date": ISODate("2015-04-18T00:00:00Z"), "qty": 100 },
            { "date": ISODate("2015-04-19T00:00:00Z"), "qty": 20 }
        ]
    }
    

    Where the dates where "actual date objects" in the database . If you wanted everything for all of user "777" data in all records then you can do:

    db.collection.aggregate([
        // Match the user between dates
        { "$match": { 
            "user_id": "777", 
            "parts.date": { 
                "$gte": new Date("2015-04-18"), "$lt": new Date("2015-04-20")
            }
        }},
    
        // Unwind the array entries
        { "$unwind": "$parts" },
    
        // Filter the required dates
        { "$match": { 
            "parts.date": { 
                "$gte": new Date("2015-04-18"), "$lt": new Date("2015-04-20")
            }
        }},
    
        // Group per user
        { "$group": {
            "_id": "$user_id",
            "total": { "$sum": "$parts.qty" }
        }}
    ])
    

    It's much more flexible to use real dates in the data as range queries will always work as they should

    0 讨论(0)
提交回复
热议问题