How to solve expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable in mapreduce job

强颜欢笑 提交于 2019-12-24 07:38:42

问题


I am trying to write a job which can analyse some information from youtube data set.I believe i have correctly set the output keys from the map in the driver class,but still i am getting the above error i am posting the code and the exception here,

The Mapper

public class YouTubeDataMapper extends Mapper<LongWritable,Text,Text,IntWritable>{

private static final IntWritable one = new IntWritable(1); 
private Text category = new Text(); 
public void mapper(LongWritable key,Text value,Context context) throws IOException, InterruptedException{
    String str[] = value.toString().split("\t");
    category.set(str[3]);
    context.write(category, one);
}

}

The Reducer class

public class YouTubeDataReducer extends Reducer<Text,IntWritable,Text,IntWritable>{

public void reduce(Text key,Iterable<IntWritable> values,Context context) throws IOException, InterruptedException{
    int sum=0;
    for(IntWritable count:values){
        sum+=count.get();
    }
    context.write(key, new IntWritable(sum));
}

}

The Driver Class

public class YouTubeDataDriver {

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();

    @SuppressWarnings("deprecation")
    Job job = new Job(conf, "categories");
    job.setJarByClass(YouTubeDataDriver.class);

    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);
    // job.setNumReduceTasks(0);
    job.setOutputKeyClass(Text.class);// Here i have set the output keys
    job.setOutputValueClass(IntWritable.class);

    job.setMapperClass(YouTubeDataMapper.class);
    job.setReducerClass(YouTubeDataReducer.class);

    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    Path out = new Path(args[1]);
    out.getFileSystem(conf).delete(out);
    job.waitForCompletion(true);

}

}

The exception i got

java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1069) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:712) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.hadoop.mapreduce.Mapper.map(Mapper.java:124) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

i have set the ouput keys in the driver class

    job.setOutputKeyClass(Text.class);// Here i have set the output keys
    job.setOutputValueClass(IntWritable.class);

But why i am still getting the error? PLease help, i am new to mapreduce


回答1:


Rename the mapper() method to map() (see official docs).

What's happening is that no data is actually being processed by the mapper. It doesn't enter the mapper() method (as it's looking for a map() method), and so leaves the map phase unchanged, meaning the map output key is still LongWritable.

As an aside,

String str[] = value.toString().split("\t");
category.set(str[3]);

is very dangerous. It's risky to assume that all of your input data will contain at least 3 \t characters. When processing large amounts of data there will almost always be some that's not in the format you expect, and you don't want your entire job to die when that happens. Consider doing something like:

String valueStr = value.toString();
if (valueStr != null) {
    String str[] = valueStr.split("\t");
    if (str[] != null && str.size > 3) {
        category.set(str[3]);
        context.write(category, one);
    }
}



回答2:


Below code(update LongWritable with Object) work for me -

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class YouTubeDataDriver {

    public static class YouTubeDataMapper
            extends Mapper<Object, Text, Text, IntWritable>{

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class YouTubeDataReducer
            extends Reducer<Text,IntWritable,Text,IntWritable> {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();

        @SuppressWarnings("deprecation")
        Job job = new Job(conf, "categories");
        job.setJarByClass(YouTubeDataDriver.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        // job.setNumReduceTasks(0);
        job.setOutputKeyClass(Text.class);// Here i have set the output keys
        job.setOutputValueClass(IntWritable.class);

        job.setMapperClass(YouTubeDataMapper.class);
        job.setReducerClass(YouTubeDataReducer.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        Path out = new Path(args[1]);
        out.getFileSystem(conf).delete(out);
        job.waitForCompletion(true);

    }

}


来源:https://stackoverflow.com/questions/49912555/how-to-solve-expected-org-apache-hadoop-io-text-received-org-apache-hadoop-io-l

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!