问题
How can we find tha maximum and minimum element of a column in a .csv.
What should we pass into context.write(key,value) of mapper.
- Whether it is each column of that csv file?
Solution
回答1:
This is a bit broad for an SO question but I'll bite.
Your mapper is for mapping values to keys. Lets say your CSV has 4 columns with numeric values:
42, 71, 45, 22
You map a key to each value; effectively what would be like the header in the CSV. Lets say column 4 represented "Number of widgets". You'd map "number_of_widgets" as the key to the value of column 4 in your mapper.
The reducer is going to get all the values for a given key. That's where you figure out your min/max. You just iterate though all the values for the key and keep track of the min and max.
回答2:
Mapper should transpose the file - for each line read, emit the key as the column number and the value as the value of the column.
Reducer should min/max. For each input key, emit the min and max value found.
来源:https://stackoverflow.com/questions/21040166/aggregation-in-mapreduce