I want to find out the largest country with greatest area.
my data set is as follows
Afghanistan 648
Albania 29
Algeria 2388
Andorra 0
Austria 84
Bah
The algorithm is easy, in the mapper you gather the max and at the end of your mapper you write it to disk using cleanup
.
int max = Integer.MIN_VALUE;
String token;
@Override
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] tokens = value.toString().split(",");
if(Integer.parseInt(tokens[2]) == 1){
int val = Integer.parseInt(tokens[3])
if(Integer.parseInt(tokens[3]) > max){
max = val;
token = tokens[0];
}
}
}
@Override
public void cleanup(Context context) throws IOException, InterruptedException {
context.write(new LongWritable(max), new Text(token));
}
All your stuff now get's reduced on the max, which means if we sort descending, you get the maximum as the first record in the reducer. Therefore you need to set this in your job:
job.setSortComparatorClass(LongWritable.DecreasingComparator.class);
The reducer is a simply found/not-found switch that just outputs every country if it has the maximum value (first record).
boolean foundMax = false;
@Override
public void reduce(LongWritable key, Iterable values, Context context) throws IOException, InterruptedException{
if(!foundMax){
for(Text t : values){
context.write(t, key);
}
foundMax = true;
}
}