Aggregation in MapReduce [closed]

笑着哭i 提交于 2019-12-13 09:04:06

问题


How can we find tha maximum and minimum element of a column in a .csv.

What should we pass into context.write(key,value) of mapper.

  1. Whether it is each column of that csv file?

Solution


回答1:


This is a bit broad for an SO question but I'll bite.

Your mapper is for mapping values to keys. Lets say your CSV has 4 columns with numeric values:

42, 71, 45, 22

You map a key to each value; effectively what would be like the header in the CSV. Lets say column 4 represented "Number of widgets". You'd map "number_of_widgets" as the key to the value of column 4 in your mapper.

The reducer is going to get all the values for a given key. That's where you figure out your min/max. You just iterate though all the values for the key and keep track of the min and max.




回答2:


Mapper should transpose the file - for each line read, emit the key as the column number and the value as the value of the column.

Reducer should min/max. For each input key, emit the min and max value found.



来源:https://stackoverflow.com/questions/21040166/aggregation-in-mapreduce

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!