Mapping through two data sets with Hadoop
问题 Suppose I have two key-value data sets--Data Sets A and B, let's call them. I want to update all the data in Set A with data from Set B where the two match on keys. Because I'm dealing with such large quantities of data, I'm using Hadoop to MapReduce. My concern is that to do this key matching between A and B, I need to load all of Set A (a lot of data) into the memory of every mapper instance. That seems rather inefficient. Would there be a recommended way to do this that doesn't require