How to normalize/reduce time data in mongoDB?

a 夏天 提交于 2019-12-07 16:31:04

问题


I'm storing minutely performance data in MongoDB, each collection is a type of performance report, and each document is the measurement at that point in time for the port on the array:

{
  "DateTime" : ISODate("2012-09-28T15:51:03.671Z"),
  "array_serial" : "12345",
  "Port Name" : "CL1-A",
  "metric" : 104.2
}

There can be up to 128 different "Port Name" entries per "array_serial".

As the data ages I'd like to be able to average it out over increasing time spans:

  • Up to 1 Week : minute
  • 1 Week to 1 month : 5 minute
  • 1 - 3 months: 15 minute

etc.. Here's how I'm averaging the times so that they can be reduced :

var resolution = 5; // How many minutes to average over     
var map = function(){
        var coeff = 1000 * 60 * resolution;
        var roundTime = new Date(Math.round(this.DateTime.getTime() / coeff) * coeff);
        emit(roundTime, { value : this.metric, count: 1 } );
 };

I'll be summing the values and counts in the reduce function, and getting the average in the finalize funciton.

As you can see this would average the data for just the time leaving out the "Port Name" value, and I need to average the values over time for each "Port Name" on each "array_serial".

So how can I include the port name in the above map function? Should the key for the emit be a compound "array_serial,PortName,DateTime" value that I split later? Or should I use the query function to query for each distinct serial, port and time? Am I storing this data in the database correctly?

Also, as far as I know this data gets saved out to it's own collection, what's the standard practice for replacing the data in the collection with this averaged data?


Is this what you mean Asya? Because it's not grouping the documents rounded to the lower 5 minute (btw, I changed 'DateTime' to 'datetime'):

    $project: {
                "year" : { $year : "$datetime" },
                "month" : { $month : "$datetime" },
                "day" : { $dayOfMonth : "$datetime" },
                "hour" : { $hour : "$datetime" },
                "minute" : { $mod : [ {$minute : "$datetime"}, 5] },
                array_serial: 1,
                port_name: 1,
                port_number: 2,
                metric: 1
}

From what I can tell the "$mod" operator will return the remainder of the minute divided by five, correct?

This would really help me if I could get the aggregation framework to do this operation rather than mapreduce.


回答1:


Here is how you could do it in aggregation framework. I'm using a small simplification - I'm only grouping on Year, Month and Date - in your case you will need to add hour and minute for the finer grained calculations. You also have a choice about whether to do weighted average if the point distribution is not uniform in the data sample you get.

project={"$project" : {
        "year" : {
            "$year" : "$DateTime"
        },
        "month" : {
            "$month" : "$DateTime"
        },
        "day" : {
            "$dayOfWeek" : "$DateTime"
        },
        "array_serial" : 1,
        "Port Name" : 1,
        "metric" : 1
    }
};
group={"$group" : {
        "_id" : {
            "a" : "$array_serial",
            "P" : "$Port Name",
            "y" : "$year",
            "m" : "$month",
                    "d" : "$day"
        },
        "avgMetric" : {
            "$avg" : "$metric"
        }
    }
};

db.metrics.aggregate([project, group]).result

I ran this with some random sample data and got something of this format:

[
    {
        "_id" : {
            "a" : "12345",
            "P" : "CL1-B",
            "y" : 2012,
            "m" : 9,
            "d" : 6
        },
        "avgMetric" : 100.8
    },
    {
        "_id" : {
            "a" : "12345",
            "P" : "CL1-B",
            "y" : 2012,
            "m" : 9,
            "d" : 7
        },
        "avgMetric" : 98
    },
    {
        "_id" : {
            "a" : "12345",
            "P" : "CL1-A",
            "y" : 2012,
            "m" : 9,
            "d" : 6
        },
        "avgMetric" : 105
    }
]

As you can see this is one result per array_serial, port name, year/month/date combination. You can use $sort to get them into the order you want to process them from there.

Here is how you would extend the project step to include hour and minute while rounding minutes to average over every five minutes:

{
    "$project" : {
        "year" : {
            "$year" : "$DateTime"
        },
        "month" : {
            "$month" : "$DateTime"
        },
        "day" : {
            "$dayOfWeek" : "$DateTime"
        },
        "hour" : {
            "$hour" : "$DateTime"
        },
        "fmin" : {
            "$subtract" : [
                {
                    "$minute" : "$DateTime"
                },
                {
                    "$mod" : [
                        {
                            "$minute" : "$DateTime"
                        },
                        5
                    ]
                }
            ]
        },
        "array_serial" : 1,
        "Port Name" : 1,
        "metric" : 1
    }
}

Hope you will be able to extend that to your specific data and requirements.




回答2:


"what's the standard practice for replacing the data in the collection with this averaged data?"

The standard practice is to keep the original data and to store all derived data separately.

In your case it means:

  • Don't delete the original data
  • Use another collection (in the same MongoDB database) to store average values


来源:https://stackoverflow.com/questions/12662169/how-to-normalize-reduce-time-data-in-mongodb

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!