Related to my CouchDB question.
Can anyone explain MapReduce in terms a numbnuts could understand?
MapReduce is a method to process vast sums of data in parallel without requiring the developer to write any other code other than the mapper and reduce functions.
The map function takes data in and churns out a result, which is held in a barrier. This function can run in parallel with a large number of the same map task. The dataset can then be reduced to a scalar value.
So if you think of it like a SQL statement
SELECT SUM(salary)
FROM employees
WHERE salary > 1000
GROUP by deptname
We can use map to get our subset of employees with salary > 1000 which map emits to the barrier into group size buckets.
Reduce will sum each of those groups. Giving you your result set.
just plucked this from my university study notes of the google paper