mongodb map reduce on multicore server

纵饮孤独 提交于 2019-11-29 23:16:11

问题


I have a mongodb with thousands of records holding very long vectors. I am looking for correlations between an input vector with my MDB data set using a certain algorithm.

psudo code:

function find_best_correlation(input_vector)
    max_correlation = 0
    return_vector = []
    foreach reference_vector in dataset:
        if calculateCorrelation(input_vector,reference_vector) > max_correlation then:
            return_vector = reference_vector
    return return_vector

This is a very good candidate for map-reduce pattern as I don't care for the order the calculations are run in.

The issue is that my database is on one node. I would like to run many mappings simultaneously (I have an 8 core machine)

From what I understand, MongoDb only uses one thread of execution per node - in practice I am running my data set serially. Is this correct?

If so can I configure the number of processes/threads per map-reduce run? If I manage multiple threads running map-reduce in parallel and then aggregate the results will I have substantial performance increase (Has anybody tried)? If not - can i have multiple replications of my DB on the same node and "trick" mongoDb to run on 2 replications?

Thanks!


回答1:


Map reduce in MongoDB uses Spidermonkey, a single-threaded Javascript engine, so it is not possible to configure multiple processes (and there are no "tricks"). There is a JIRA ticket to use a multi-threaded JS engine, which you can follow here: https://jira.mongodb.org/browse/SERVER-2407

If possible, I would consider looking into the new aggregation framework (available in MongoDB version 2.2), which is written in C++ instead of Javascript and may offer performance improvements: http://docs.mongodb.org/manual/applications/aggregation/



来源:https://stackoverflow.com/questions/11748872/mongodb-map-reduce-on-multicore-server

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!