MongoDB's performance on aggregation queries

前端 未结 3 1448
悲哀的现实
悲哀的现实 2020-12-01 05:42

After hearing so many good things about MongoDB\'s performance we decided to give Mongodb a try to solve a problem we have. I started by moving all the records we have in se

相关标签:
3条回答
  • 2020-12-01 05:54

    Aggregation (map reduce or otherwise) is very slow in mongo because it is done by the javascript VM, not the database engine. This continues to be a limitation of this (very good, imo) db for time series data.

    0 讨论(0)
  • 2020-12-01 06:07

    Couple things.

    1) Your group query is processing lots of data. While your result set is small, it looks like it's doing a table scale of all of the data in your collection in order to generate that small result. This is probably the root cause of the slowness. To speed this up, you might want to look at the disk performance of your server through iostat while the query is running as that is likely the bottleneck.

    2) As has been pointed out in other answers, the group command uses the javascript interpreter, which is going to limit performance. You might try using the new aggregation framework that is released as beta in 2.1 (note: this is an unstable release as of Feb 24 2012). See http://blog.mongodb.org/post/16015854270/operations-in-the-new-aggregation-framework for a good introduction. This won't overcome data volume problem in (1), but it is implemented in C++ and if javascript time is the bottleneck, then it should be much faster.

    3) Another approach would be to use incremental map-reduce to generate a second collection with your grouped results. The idea is that you'd run a map-reduce job to aggregate your results once, and then periodically run another map-reduce job that re-reduces new data into the existing collection. Then you can query this second collection from your app rather than running a group command every time.

    0 讨论(0)
  • 2020-12-01 06:08

    The idea is that you improve the performance of aggregation queries by using MapReduce on a sharded database that is distributed over multiple machines.

    I did some comparisons of the performance of Mongo's Mapreduce with a group-by-select statement in Oracle on the same machine. I did find that Mongo was approximately 25 times slower. This means that I have to shard the data over at least 25 machines to get the same performance with Mongo as Oracle delivers on a single machine. I used a collection/table with approximately 14 million documents/rows.

    Exporting the data from mongo via mongoexport.exe and using the exported data as an external table in Oracle and doing a group-by in Oracle was much faster than using Mongo's own MapReduce.

    0 讨论(0)
提交回复
热议问题