Hadoop strange behaviour: reduce function doesn't get all values for a key

偶尔善良 提交于 2019-12-11 20:20:02

问题


In my Hadoop project, I am reading lines of text file with a number of names for each line. The first name represents my username, and the rest are a list of friends. Then I am creating pairs of (username, friend) , in the map function, each pair has a key "Key[name1][name2]" where name1,2 are the username and the friend name ordered alphabetically. Normally, after reading the line of userA and line of userB , and they both have each other in their friends list, I would get 2 identic keys with different values, which in this case is: KeyUserAUserB : "UserA,UserB" and KeyUserAUserB : "UserB,UserA". What I expect in the reduce function is to get, at one point, KeyUserAUserB as a key and a pair of "UserA,UserB","UserB,UserA" as values . So the values iterator would have 2 elements. However, in the reducer function, I get twice KeyUserAUserB with a single value respectively. This is not what I am expecting from Hadoop....

I also noticed in my userlogs , I have 4 "m" folders, and in the first 2 of them I have the logs which helped me identify the above. In both "m" logs the output (System.out) of the map function is intertwined with the output of reduce function . I don't know if that has anything to do with my anomaly, but I expected the reduce output to stay in the "r" folder. Also, for the above example, one log for KeyUserAUserB is printed in one "m" log file, and the other KeyUserAUserB in the other... Although for some cases it happens that a KeyUserAUserB comes to the reducer with both values, i found at least one case when it never comes with both values (and also those 2 pairs key-value with identical key reside in different "m" log files).

Another thing I noticed, the output collect from the Reduce function doesn't send the values directly to the output file, but passes them again as an input for the the same Reduce function...

What do you think about this behavior, what can be the possible causes?


回答1:


Finally. The whole unexpected behavior is because I am using a combiner class = the reducer class. After commenting that line, everything worked as expected.



来源:https://stackoverflow.com/questions/26693034/hadoop-strange-behaviour-reduce-function-doesnt-get-all-values-for-a-key

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!