speed up large result set processing using rmongodb

*爱你&永不变心* 提交于 2019-12-03 14:36:27

You might want to try the mongo.find.exhaust option

cursor <- mongo.find(mongo, query, options=[mongo.find.exhaust])

This would be the easiest fix if actually works for your use case.

However the rmongodb driver seems to be missing some extra features available on other drivers. For example the JavaScript driver has a Cursor.toArray method. Which directly dumps all the find results to an array. The R driver has a mongo.bson.to.list function, but a mongo.cursor.to.list is probably what you want. It's probably worth pinging the driver developer for advice.

A hacky solution could be to create a new collection whose documents are data "chunks" of 100000 of the original documents each. Then these each of these could be efficiently read with mongo.bson.to.list. The chunked collection could be constructed using the mongo server MapReduce functionality.

I know of no faster way to do this in a general manner. You are importing data from a foreign application and working with an interpreted language and there's no way rmongodb can anticipate the structure of the documents in the collection. The process is inherently slow when you are dealing with thousands of documents.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!