I hear a lot about map/reduce, especially in the context of Google\'s massively parallel compute system. What exactly is it?
MapReduce:
To run something large, we can use computation power of different computer in our office. Difficult part is to split task among different computers.It is done by MapReduce library.
The basic idea is that you divide the job into two parts: a Map, and a Reduce. Map basically takes the problem, splits it into sub-parts, and sends the sub-parts to different machines – so all the pieces run at the same time. Reduce takes the results from the sub-parts and combines them back together to get a single answer.
Input is a list of records.Result of the map computation is a list of key/value pairs. Reduce takes each set of values that has the same key, and combines them into a single value.You can’t tell whether the job was split into 100 pieces or 2 pieces; the end result looks pretty much like the result of a single map.
Please looks at simple map and reduce program:
Map function is used to apply some function on our original list and a new list is hence generated. The map() function in Python takes in a function and a list as argument. A new list is returned by applying function to each item of list.
li = [5, 7, 4, 9]
final_list = list(map(lambda x: x*x , li))
print(final_list) #[25, 49, 16, 81]
The reduce() function in Python takes in a function and a list as argument. The function is called with a lambda function and a list and a new reduced result is returned. This performs a repetitive operation over the pairs of the list.
#reduce func to find product/sum of list
x=(1,2,3,4)
from functools import reduce
reduce(lambda a,b:a*b ,x) #24