MongoDB group by hour

纵饮孤独 提交于 2019-12-17 14:02:58

问题


I save tweets to mongo DB:

 twit.stream('statuses/filter', {'track': ['animal']}, function(stream) {
    stream.on('data', function(data) {
        console.log(util.inspect(data));

        data.created_at = new Date(data.created_at);
        collectionAnimal.insert(data, function(err, docs) {});
    });
});

It's OK.

The tweet time in MongoDB is in format: 2014-04-25 11:45:14 GMT (column created_at) Now I need group column created_at in hours. I would like to have the result:

hour | count tweets in hour


1 | 28

2 | 26

3 | 32

4 | 42

5 | 36

...

My unsuccessful attempt:

    $keys = array('created_at' => true);
    $initial = array('count' => 0);
    $reduce = "function(doc, prev) { prev.count += 1 }";

    $tweetsGroup = $this->collectionAnimal->group( $keys, $initial, $reduce );

But my not able to group by hour.

How to do it?


回答1:


I could tell you how you can group using aggregation framework directly on mongo console

db.tweets.aggregate(
 { "$project": {
      "y":{"$year":"$created_at"},
      "m":{"$month":"$created_at"},
      "d":{"$dayOfMonth":"$created_at"},
      "h":{"$hour":"$created_at"},
      "tweet":1 }
 },
 { "$group":{ 
       "_id": { "year":"$y","month":"$m","day":"$d","hour":"$h"},
       "total":{ "$sum": "$tweet"}
   }
 })

For more options you can look here: http://docs.mongodb.org/manual/reference/operator/aggregation-date/

You will also need to find appropriate way of of using aggregation framework from whichever programming language you are using.




回答2:


There should be no need to use a $project stage here as the date operator functions can just be employed directly in the $group stage when defining the grouping _id. This saves having to process the entire collection in order to get the result:

Also you are just counting, so simply { "$sum" : 1 }, in which defining a field that didn't exist was the problem resulting in 0.

    $this->collection->aggregate(array(
        array(
            '$group' => array(
                "_id" => array( 
                    "y" => array( '$year' => '$created_at' ),
                    "m" => array( '$month' => '$created_at' ),
                    "d" => array( '$dayOfMonth' => '$created_at' ),
                    "h" => array( '$hour' => '$created_at' ),
                ),
                "total" => array( '$sum' => 1 ),
            ),
        )
    ));

If anything, add a $match stage at the start of the pipeline in order to filter the date. If one day is acceptable for output then you only need to define the $hour in the grouping and you are reducing the working set size, which means faster. And probably what you want to do anyway.



来源:https://stackoverflow.com/questions/23293082/mongodb-group-by-hour

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!