Source: Google Interview Question
Given a large network of computers, each keeping log files of visited urls, find the top ten most visited URLs
It says you can't use map-reduce directly which is a hint the author of the question wants you to think how map reduce works, so we will just mimic the actions of map-reduce:
hash(string) % R
.(string,count)
of the top 10 strings per server. Note that the tuples where those sent in step2 to this particular server.Notes: