Get last documents with a distinct criteria

旧城冷巷雨未停 提交于 2020-01-11 10:15:31

问题


Situation

I'm having trouble coming up with a good way to do a certain MongoDb query. First, here's what kind of query I want to do. Assume a simple database which logs entry and exit events (and possibly other actions, doesn't matter) by electronic card swipe. So there's a collection called swipelog with simple documents which look like this:

{
    _id: ObjectId("524ab4790a4c0e402200052c")
    name: "John Doe", 
    action: "entry",
    timestamp: ISODate("2013-10-01T1:32:12.112Z") 
}

Now I want to list names and their last entry times (and any other fields I may want, but example below uses just these two fields).

Current solution

Here is what I have now, as a "one-liner" for MongoDb JavaScript console:

db.swipelog.distinct('name')
           .forEach( function(name) {

    db.swipelog.find( { name: name, action:"entry" } )
               .sort( { $natural:-1 } )
               .limit(1)
               .forEach( function(entry) {

        printjson( [ entry.name, entry.timestamp ] )
    }) 
})

Which prints something like:

[ "John Doe", ISODate("2013-10-01T1:32:12.112Z")]
[ "Jane Deo", ISODate("2013-10-01T1:36:12.112Z")]
...

Question

I think above has the obvious scaling problem. If there are a hundred names, then 1+100 queries will be made to the database. So what is a good/correct way to get "last timestamp of every distinct name" ? Changing database structure or adding some collections is ok, if it makes this easier.


回答1:


You can use aggregation framework to achieve this:

 db.collection.aggregate(
          [
             {$match:
                  {action:'entry'}
             },
             {$group:
                  {_id:'$name',
                   first:
                         {$max:'$timestamp'}
                  }
             }
          ])

If you likely to include other fields in the results, you can use the $first operator

 db.collection.aggregate(
          [
             {$match:
                  {action:'entry'}
             },
             {$sort:
                  {name:1, timestamp:-1}
             },
             {$group:
                  {_id:'$name',
                   timestamp: {$first:'$timestamp'},
                   otherField: {$first:'$otherField'},
                  }
             }
          ])



回答2:


This answer should be a comment on attish's answer above, but I don't have sufficient rep here to comment

Keep in mind that the aggregation framework cannot return more than 16MB of data. If you have a very large number of users, you may run into this limitation on your production system.

MongoDB 2.6 adds new features to the aggregation framework to deal with this:

  • db.collection.aggregateCursor() (temporary name) is identical to db.collection.aggregate() except that it returns a cursor instead of a document. This avoids the 16MB limitation
  • $out is a new pipeline phase that directs the pipeline's output to a collection. This allows you to run aggregation jobs
  • $sort has been improved to remove its RAM limitations and increase speed

If query performance is more important than data age, you could schedule a regular aggregate command that stores its results in collection like db.last_swipe, then have your application simply query db.last_swipe for the relevant user.

Conclusion: I agree that attish has the right approach. However, you may run into trouble scaling it on the current MongoDB release and should look into Mongo 2.6.



来源:https://stackoverflow.com/questions/19114541/get-last-documents-with-a-distinct-criteria

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!