发表新帖

发表新帖

Simple explanation of MapReduce?

前端未结

关注

 8  1152

佛祖请我去吃肉 2020-12-04 04:57

Related to my CouchDB question.

Can anyone explain MapReduce in terms a numbnuts could understand?

8条回答

一整个雨季 (楼主)

2020-12-04 05:27
Let's take the example from the Google paper. The goal of MapReduce is to be able to use efficiently a load of processing units working in parallels for some kind of algorithms. The exemple is the following: you want to extract all the words and their count in a set of documents.

Typical implementation:
```
for each document
    for each word in the document
        get the counter associated to the word for the document
        increment that counter 
    end for
end for
```
MapReduce implementation:
```
Map phase (input: document key, document)
for each word in the document
    emit an event with the word as the key and the value "1"
end for

Reduce phase (input: key (a word), an iterator going through the emitted values)
for each value in the iterator
    sum up the value in a counter
end for
```
Around that, you'll have a master program which will partition the set of documents in "splits" which will be handled in parallel for the Map phase. The emitted values are written by the worker in a buffer specific to the worker. The master program then delegates other workers to perform the Reduce phase as soon as it is notified that the buffer is ready to be handled.

Every worker output (being a Map or a Reduce worker) is in fact a file stored on the distributed file system (GFS for Google) or in the distributed database for CouchDB.
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...

热议问题