【MapReduce】Mapreduce基础知识整理 (七) 自定义输出

匆匆过客 提交于 2020-01-28 11:18:34

自定义输出

默认输出

  • FileOutputFormat
    • TextOutputFormat
      • RecordWriter
        • LineRecordWriter

自定义输出:

  • 创建一个类继承FileOutputFormat
    重写getRecordWriter
  • 创建一个文件真正的写入器,继承RecordRecordWriter
    重写write() close()
  • job中指定自定义的输出类
    job.setOutputFormatClass(MyFileOutputFormat.class);

案例:按学生平均成绩及格和不及格输出到不同文件

computer,huangxiaoming,85
computer,xuzheng,54
computer,huangbo,86
computer,liutao,85
computer,huanglei,99
computer,huangxiaoming,85
computer,xuzheng,54
computer,huangbo,86
computer,liujialing,45
computer,liuyifei,75
computer,huangdatou,48
computer,huangjiaju,88
computer,huangzitao,85

MyFileOutputFormat.java

/**
 * 泛型:reduce端输出的key,value
 */
public class MyFileOutputFormat extends FileOutputFormat<Text, DoubleWritable> {

    /**
     * @param job 上下文对象
     */
    public RecordWriter<Text, DoubleWritable> getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException {
        //获取文件系统,向fs中写
        FileSystem fs = FileSystem.get(job.getConfiguration());
        return new MyRecordWriter(fs);
    }
}

MyRecordWriter.java

public class MyRecordWriter extends RecordWriter<Text, DoubleWritable> {
    FileSystem fs;
    FSDataOutputStream fsDataOutputStream1;
    FSDataOutputStream fsDataOutputStream2;
    public MyRecordWriter(FileSystem fs) throws IOException {
        this.fs = fs;
        fsDataOutputStream1 = fs.create(new Path("/tmpout/customOutput/jige"));
        fsDataOutputStream2 = fs.create(new Path("/tmpout/customOutput/bujige"));
    }
    @Override
    public void write(Text key, DoubleWritable value) throws IOException, InterruptedException {
        //成绩
        double score = value.get();
        byte[] bytes = (key.toString() + "————" + score + "\n").getBytes();
        if(score>=60){
            fsDataOutputStream1.write(bytes);
        }else{
            fsDataOutputStream2.write(bytes);
        }
    }
    @Override
    public void close(TaskAttemptContext context) throws IOException, InterruptedException {
        fs.close();
        fsDataOutputStream1.close();
        fsDataOutputStream2.close();
    }
}

CustomOutput.java

public class CustomOutput {
    static class CustomOutputMapper extends Mapper<LongWritable,Text,Text, IntWritable>{
        Text mk=new Text();
        IntWritable mv=new IntWritable();
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            //computer,huangbo,86
            String[] datas = value.toString().split(",");
            if(datas.length==3){
                mk.set(datas[1]);
                mv.set(Integer.parseInt(datas[2]));
                //huangbo,86
                context.write(mk, mv);
            }
        }
    }

    static class CustomOutputReducer extends Reducer<Text,IntWritable,Text, DoubleWritable>{
        DoubleWritable rv=new DoubleWritable();

        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
            int sum=0;
            int count=0;
            for(IntWritable  v: values){
                count++;
                sum+=v.get();
            }
            double avg=1.0*sum/count;
            rv.set(avg);
            context.write(key, rv);
        }
    }

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        System.setProperty("HADOOP_USER_NAME","hdp01");
        Configuration conf = new Configuration();
        conf.set("mapperduce.framework.name","local");
        conf.set("fs.defaultFS","hdfs://10.211.55.20:9000");

        Job job = Job.getInstance(conf);

        job.setJarByClass(CustomOutput.class);
        job.setMapperClass(CustomOutputMapper.class);
        job.setReducerClass(CustomOutputReducer.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(DoubleWritable.class);

        //指定自定义输出
        job.setOutputFormatClass(MyFileOutputFormat.class);

        FileInputFormat.addInputPath(job,new Path("/tmpin/score.txt"));
        FileOutputFormat.setOutputPath(job,new Path("/tmpout/customOutput/out1"));

        job.waitForCompletion(true);
    }
}

输出结果:

[hdp01@hdp01 ~]$ hdfs dfs -ls  /tmpout/customOutput
Found 3 items
-rw-r--r--   3 hdp01 supergroup         79 2019-12-25 18:28 /tmpout/customOutput/bujige
-rw-r--r--   3 hdp01 supergroup        300 2019-12-25 18:28 /tmpout/customOutput/jige
drwxr-xr-x   - hdp01 supergroup          0 2019-12-25 18:28 /tmpout/customOutput/out1
[hdp01@hdp01 ~]$ hdfs dfs -cat /tmpout/customOutput/bujige
huangdatou————48.0
xuzheng————54.0
zhaobenshan————57.0
[hdp01@hdp01 ~]$ hdfs dfs -cat /tmpout/customOutput/jige
huangbo————85.66666666666667
huangjiaju————85.75
huanglei————82.83333333333333
huangxiaoming————89.4
huangzitao————82.33333333333333
liujialing————80.0
liutao————66.5
liuyifei————77.2
wangbaoqiang————85.0
zhouqi————85.0
[hdp01@hdp01 ~]$ hdfs dfs -ls /tmpout/customOutput/out1
Found 1 items
-rw-r--r--   3 hdp01 supergroup          0 2019-12-25 18:28 /tmpout/customOutput/out1/_SUCCESS

注意自定义文件输出和reducetask区别,不要混淆:
自定义文件输出结果到两个文件中,这个是可以任意指定的
reducetask的最终输出文件,取决于分区

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!