问题
In my Hadoop project, I am reading lines of text file with a number of names for each line. The first name represents my username, and the rest are a list of friends. Then I am creating pairs of (username, friend) , in the map function, each pair has a key "Key[name1][name2]" where name1,2 are the username and the friend name ordered alphabetically. Normally, after reading the line of userA and line of userB , and they both have each other in their friends list, I would get 2 identic keys with different values, which in this case is: KeyUserAUserB : "UserA,UserB" and KeyUserAUserB : "UserB,UserA". What I expect in the reduce function is to get, at one point, KeyUserAUserB as a key and a pair of "UserA,UserB","UserB,UserA" as values . So the values iterator would have 2 elements. However, in the reducer function, I get twice KeyUserAUserB with a single value respectively. This is not what I am expecting from Hadoop....
I also noticed in my userlogs , I have 4 "m" folders, and in the first 2 of them I have the logs which helped me identify the above. In both "m" logs the output (System.out) of the map function is intertwined with the output of reduce function . I don't know if that has anything to do with my anomaly, but I expected the reduce output to stay in the "r" folder. Also, for the above example, one log for KeyUserAUserB is printed in one "m" log file, and the other KeyUserAUserB in the other... Although for some cases it happens that a KeyUserAUserB comes to the reducer with both values, i found at least one case when it never comes with both values (and also those 2 pairs key-value with identical key reside in different "m" log files).
Another thing I noticed, the output collect from the Reduce function doesn't send the values directly to the output file, but passes them again as an input for the the same Reduce function...
What do you think about this behavior, what can be the possible causes?
回答1:
Finally. The whole unexpected behavior is because I am using a combiner class = the reducer class. After commenting that line, everything worked as expected.
来源:https://stackoverflow.com/questions/26693034/hadoop-strange-behaviour-reduce-function-doesnt-get-all-values-for-a-key