Riak Map Reduce in JS returning limited data

给你一囗甜甜゛ 提交于 2019-12-11 04:07:52

问题


So I have Riak running on 2 EC2 servers, using python to run javascript Mapreduce. They have been clustered. Mainly used for "proof of concept".

There are 50 keys in the bucket, all the map/reduce function does is re-format the data. This is only for testing the map/reduce functionality in Riak.

Problem: The output only shows [{u'e': 2, u'undefined': 2, u'w': 2}]. That is completely wrong. The logs show that all the keys have "processed" but only 2 get returned. So my question is why is that happening and am I missing something important.

Code:

import riak
client = riak.RiakClient()
query = riak.RiakMapReduce(client).add('raw_hits10')
query.map("""function(v) {
      var data = JSON.parse(v.values[0].data);
      return [[data, 1]];
}""")
query.reduce("""function(vk) {
         var res = {};
         for (var indx in vk) {
            var key_t = vk[indx][0];
            var val_t = vk[indx][1];
            ejsLog('/tmp/map_reduce.log', key_t + "--- " + val_t);

            res[key_t] = 2;
         }
         return [res]
    }
      """)


for res in query.run():
    print res

The results from printing:

[{u'e': 2, u'undefined': 2, u'w': 2}]

This makes no sense


回答1:


In order to avoid having to load all data from the preceding phase into memory on the coordinating node before running the reduce phase (which would be problematic for large mapreduce jobs), the reduce function is run multiple times. Every iteration gets a batch of results from preceding phase together with any output from earlier reduce phase iteration(s). The default batch size is 20, but this is configurable. As the results from one reduce phase iteration will be fed in as input to the next iteration, reduce phase functions need to designed to handle this, and some strategies are described here.

It is also possible to force Riak to only run the reduce phase once for the entire input set by specifying the 'reduce_phase_only_1' parameter, but this is generally not recommended, especially for large jobs.



来源:https://stackoverflow.com/questions/16359656/riak-map-reduce-in-js-returning-limited-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!