Hadoop and number of reducers in Eclipse

问题

In my mapReduce program, i have to use a Partitionner :

public class TweetPartitionner extends HashPartitioner<Text, IntWritable>{

    public int getPartition(Text a_key, IntWritable a_value, int a_nbPartitions) {
        if(a_key.toString().startsWith("#"))
            return 0;
        else
            return 1;
    }

}

And I have set the number of reduce tasks : job.setNumReduceTasks(2);

But I get the following error : java.io.IOException: Illegal partition for #rescinfo (1)

The parameter a_nbPartitions returns 1.

I've read in another post : Hadoop: Number of reducer is not equal to what I have set in program that

Running it in eclipse seems to use the local job runner. It only supports 0 or 1 reducers. If you try to set it to use more than one reducer, it ignores it and just uses one anyway.

I developp on a Hadoop 0.20.2 installed on Cygwin and I of course use Eclipse. How can I do ?

回答1:

You actually don't need a dedicated Hadoop cluster for that. It's just that you must tell Eclipse that you intend to run this job on your pseudo-distributed cluster and not to run locally within itself. To do that you need to add these lines in your code :

Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://localhost:9000");
conf.set("mapred.job.tracker", "localhost:9001");

And after that set the number of reducers to 2 via :

job.setNumReduceTasks(2);

And yes, you have to be very sure about your partitioner logic. You can visit this page which shows how to write a custom partitioner.

HTH

回答2:

Until you have a dedicated hadoop cluster to run your job, there is no way to have more than 1 reducer in local mode. You could configure Eclipse to submit your job to an hadoop cluster though, and then your configuration will be took into account.

In every case, you should always use return Math.min(i, a_nbPartitions-1) when writing your own partitioner.

来源：https://stackoverflow.com/questions/17298659/hadoop-and-number-of-reducers-in-eclipse

标签

eclipse

Hadoop

MapReduce