I have mapreduce job: my code Map class:
public static class MapClass extends Mapper {
@Override
public void m
After a lot of "Kung Fu", I was able to use ChainMapper/ChainReducer
. Thanks for last comment user864846.
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package myPKG;
/*
* Ajitsen: Sample program for ChainMapper/ChainReducer. This program is modified version of WordCount example available in Hadoop-0.18.0. Added ChainMapper/ChainReducer and made to works in Hadoop 1.0.2.
*/
import java.io.IOException;
import java.util.Iterator;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.mapred.lib.ChainMapper;
import org.apache.hadoop.mapred.lib.ChainReducer;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class ChainWordCount extends Configured implements Tool {
public static class Tokenizer extends MapReduceBase
implements Mapper {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector output,
Reporter reporter) throws IOException {
String line = value.toString();
System.out.println("Line:"+line);
StringTokenizer itr = new StringTokenizer(line);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
output.collect(word, one);
}
}
}
public static class UpperCaser extends MapReduceBase
implements Mapper {
public void map(Text key, IntWritable value,
OutputCollector output,
Reporter reporter) throws IOException {
String word = key.toString().toUpperCase();
System.out.println("Upper Case:"+word);
output.collect(new Text(word), value);
}
}
public static class Reduce extends MapReduceBase
implements Reducer {
public void reduce(Text key, Iterator values,
OutputCollector output,
Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
System.out.println("Word:"+key.toString()+"\tCount:"+sum);
output.collect(key, new IntWritable(sum));
}
}
static int printUsage() {
System.out.println("wordcount
EDIT in latest version (at least from hadoop 2.6), the true
flag in addMapper is not needed. (in fact the signature has change suppression it`).
So it would be just
JobConf mapAConf = new JobConf(false);
ChainMapper.addMapper(conf, Tokenizer.class, LongWritable.class, Text.class,
Text.class, IntWritable.class, mapAConf);