I save tweets to mongo DB:
twit.stream(\'statuses/filter\', {\'track\': [\'animal\']}, function(stream) {
stream.on(\'data\', function(data) {
c
Lalit's answer did not work for me, it kept giving me zeroes. Instead I did:
db.tweets.aggregate(
{ "$project": {
"y":{"$year":"$created_at"},
"m":{"$month":"$created_at"},
"d":{"$dayOfMonth":"$created_at"},
"h":{"$hour":"$created_at"},
"tweet":1 }
},
{ "$group":{
"_id": { "year":"$y","month":"$m","day":"$d","hour":"$h"},
'count':{$sum:1}
}
})
the 'count':{$sum:1}
is the only difference.
Might help someone new to mongo like me.
There should be no need to use a $project stage here as the date operator functions can just be employed directly in the $group stage when defining the grouping _id
. This saves having to process the entire collection in order to get the result:
Also you are just counting, so simply { "$sum" : 1 }
, in which defining a field that didn't exist was the problem resulting in 0.
$this->collection->aggregate(array(
array(
'$group' => array(
"_id" => array(
"y" => array( '$year' => '$created_at' ),
"m" => array( '$month' => '$created_at' ),
"d" => array( '$dayOfMonth' => '$created_at' ),
"h" => array( '$hour' => '$created_at' ),
),
"total" => array( '$sum' => 1 ),
),
)
));
If anything, add a $match stage at the start of the pipeline in order to filter the date. If one day is acceptable for output then you only need to define the $hour
in the grouping and you are reducing the working set size, which means faster. And probably what you want to do anyway.
I could tell you how you can group using aggregation framework directly on mongo console
db.tweets.aggregate(
{ "$project": {
"y":{"$year":"$created_at"},
"m":{"$month":"$created_at"},
"d":{"$dayOfMonth":"$created_at"},
"h":{"$hour":"$created_at"},
"tweet":1 }
},
{ "$group":{
"_id": { "year":"$y","month":"$m","day":"$d","hour":"$h"},
"total":{ "$sum": "$tweet"}
}
})
For more options you can look here: http://docs.mongodb.org/manual/reference/operator/aggregation-date/
You will also need to find appropriate way of of using aggregation framework from whichever programming language you are using.