问题
Situation
I'm having trouble coming up with a good way to do a certain MongoDb query. First, here's what kind of query I want to do. Assume a simple database which logs entry and exit events (and possibly other actions, doesn't matter) by electronic card swipe. So there's a collection called swipelog
with simple documents which look like this:
{
_id: ObjectId("524ab4790a4c0e402200052c")
name: "John Doe",
action: "entry",
timestamp: ISODate("2013-10-01T1:32:12.112Z")
}
Now I want to list names and their last entry times (and any other fields I may want, but example below uses just these two fields).
Current solution
Here is what I have now, as a "one-liner" for MongoDb JavaScript console:
db.swipelog.distinct('name')
.forEach( function(name) {
db.swipelog.find( { name: name, action:"entry" } )
.sort( { $natural:-1 } )
.limit(1)
.forEach( function(entry) {
printjson( [ entry.name, entry.timestamp ] )
})
})
Which prints something like:
[ "John Doe", ISODate("2013-10-01T1:32:12.112Z")]
[ "Jane Deo", ISODate("2013-10-01T1:36:12.112Z")]
...
Question
I think above has the obvious scaling problem. If there are a hundred names, then 1+100 queries will be made to the database. So what is a good/correct way to get "last timestamp
of every distinct name
" ? Changing database structure or adding some collections is ok, if it makes this easier.
回答1:
You can use aggregation framework to achieve this:
db.collection.aggregate(
[
{$match:
{action:'entry'}
},
{$group:
{_id:'$name',
first:
{$max:'$timestamp'}
}
}
])
If you likely to include other fields in the results, you can use the $first operator
db.collection.aggregate(
[
{$match:
{action:'entry'}
},
{$sort:
{name:1, timestamp:-1}
},
{$group:
{_id:'$name',
timestamp: {$first:'$timestamp'},
otherField: {$first:'$otherField'},
}
}
])
回答2:
This answer should be a comment on attish's answer above, but I don't have sufficient rep here to comment
Keep in mind that the aggregation framework cannot return more than 16MB of data. If you have a very large number of users, you may run into this limitation on your production system.
MongoDB 2.6 adds new features to the aggregation framework to deal with this:
db.collection.aggregateCursor()
(temporary name) is identical todb.collection.aggregate()
except that it returns a cursor instead of a document. This avoids the 16MB limitation$out
is a new pipeline phase that directs the pipeline's output to a collection. This allows you to run aggregation jobs$sort
has been improved to remove its RAM limitations and increase speed
If query performance is more important than data age, you could schedule a regular aggregate
command that stores its results in collection like db.last_swipe
, then have your application simply query db.last_swipe
for the relevant user.
Conclusion: I agree that attish has the right approach. However, you may run into trouble scaling it on the current MongoDB release and should look into Mongo 2.6.
来源:https://stackoverflow.com/questions/19114541/get-last-documents-with-a-distinct-criteria