Related to my CouchDB question.
Can anyone explain MapReduce in terms a numbnuts could understand?
If you are familiar with Python, following is the simplest possible explanation of MapReduce:
In [2]: data = [1, 2, 3, 4, 5, 6]
In [3]: mapped_result = map(lambda x: x*2, data)
In [4]: mapped_result
Out[4]: [2, 4, 6, 8, 10, 12]
In [10]: final_result = reduce(lambda x, y: x+y, mapped_result)
In [11]: final_result
Out[11]: 42
See how each segment of raw data was processed individually, in this case, multiplied by 2 (the map part of MapReduce). Based on the mapped_result, we concluded that the result would be 42 (the reduce part of MapReduce).
An important conclusion from this example is the fact that each chunk of processing doesn't depend on another chunk. For instance, if thread_1 maps [1, 2, 3], and thread_2 maps [4, 5, 6], the eventual result of both the threads would still be [2, 4, 6, 8, 10, 12] but we have halved the processing time for this. The same can be said for the reduce operation and is the essence of how MapReduce works in parallel computing.