自定义输出
默认输出:
- FileOutputFormat
- TextOutputFormat
- RecordWriter
- LineRecordWriter
- RecordWriter
- TextOutputFormat
自定义输出:
- 创建一个类继承FileOutputFormat
重写getRecordWriter - 创建一个文件真正的写入器,继承RecordRecordWriter
重写write() close() - job中指定自定义的输出类
job.setOutputFormatClass(MyFileOutputFormat.class);
案例:按学生平均成绩及格和不及格输出到不同文件
computer,huangxiaoming,85
computer,xuzheng,54
computer,huangbo,86
computer,liutao,85
computer,huanglei,99
computer,huangxiaoming,85
computer,xuzheng,54
computer,huangbo,86
computer,liujialing,45
computer,liuyifei,75
computer,huangdatou,48
computer,huangjiaju,88
computer,huangzitao,85
MyFileOutputFormat.java
/**
* 泛型:reduce端输出的key,value
*/
public class MyFileOutputFormat extends FileOutputFormat<Text, DoubleWritable> {
/**
* @param job 上下文对象
*/
public RecordWriter<Text, DoubleWritable> getRecordWriter(TaskAttemptContext job) throws IOException, InterruptedException {
//获取文件系统,向fs中写
FileSystem fs = FileSystem.get(job.getConfiguration());
return new MyRecordWriter(fs);
}
}
MyRecordWriter.java
public class MyRecordWriter extends RecordWriter<Text, DoubleWritable> {
FileSystem fs;
FSDataOutputStream fsDataOutputStream1;
FSDataOutputStream fsDataOutputStream2;
public MyRecordWriter(FileSystem fs) throws IOException {
this.fs = fs;
fsDataOutputStream1 = fs.create(new Path("/tmpout/customOutput/jige"));
fsDataOutputStream2 = fs.create(new Path("/tmpout/customOutput/bujige"));
}
@Override
public void write(Text key, DoubleWritable value) throws IOException, InterruptedException {
//成绩
double score = value.get();
byte[] bytes = (key.toString() + "————" + score + "\n").getBytes();
if(score>=60){
fsDataOutputStream1.write(bytes);
}else{
fsDataOutputStream2.write(bytes);
}
}
@Override
public void close(TaskAttemptContext context) throws IOException, InterruptedException {
fs.close();
fsDataOutputStream1.close();
fsDataOutputStream2.close();
}
}
CustomOutput.java
public class CustomOutput {
static class CustomOutputMapper extends Mapper<LongWritable,Text,Text, IntWritable>{
Text mk=new Text();
IntWritable mv=new IntWritable();
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//computer,huangbo,86
String[] datas = value.toString().split(",");
if(datas.length==3){
mk.set(datas[1]);
mv.set(Integer.parseInt(datas[2]));
//huangbo,86
context.write(mk, mv);
}
}
}
static class CustomOutputReducer extends Reducer<Text,IntWritable,Text, DoubleWritable>{
DoubleWritable rv=new DoubleWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum=0;
int count=0;
for(IntWritable v: values){
count++;
sum+=v.get();
}
double avg=1.0*sum/count;
rv.set(avg);
context.write(key, rv);
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
System.setProperty("HADOOP_USER_NAME","hdp01");
Configuration conf = new Configuration();
conf.set("mapperduce.framework.name","local");
conf.set("fs.defaultFS","hdfs://10.211.55.20:9000");
Job job = Job.getInstance(conf);
job.setJarByClass(CustomOutput.class);
job.setMapperClass(CustomOutputMapper.class);
job.setReducerClass(CustomOutputReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class);
//指定自定义输出
job.setOutputFormatClass(MyFileOutputFormat.class);
FileInputFormat.addInputPath(job,new Path("/tmpin/score.txt"));
FileOutputFormat.setOutputPath(job,new Path("/tmpout/customOutput/out1"));
job.waitForCompletion(true);
}
}
输出结果:
[hdp01@hdp01 ~]$ hdfs dfs -ls /tmpout/customOutput
Found 3 items
-rw-r--r-- 3 hdp01 supergroup 79 2019-12-25 18:28 /tmpout/customOutput/bujige
-rw-r--r-- 3 hdp01 supergroup 300 2019-12-25 18:28 /tmpout/customOutput/jige
drwxr-xr-x - hdp01 supergroup 0 2019-12-25 18:28 /tmpout/customOutput/out1
[hdp01@hdp01 ~]$ hdfs dfs -cat /tmpout/customOutput/bujige
huangdatou————48.0
xuzheng————54.0
zhaobenshan————57.0
[hdp01@hdp01 ~]$ hdfs dfs -cat /tmpout/customOutput/jige
huangbo————85.66666666666667
huangjiaju————85.75
huanglei————82.83333333333333
huangxiaoming————89.4
huangzitao————82.33333333333333
liujialing————80.0
liutao————66.5
liuyifei————77.2
wangbaoqiang————85.0
zhouqi————85.0
[hdp01@hdp01 ~]$ hdfs dfs -ls /tmpout/customOutput/out1
Found 1 items
-rw-r--r-- 3 hdp01 supergroup 0 2019-12-25 18:28 /tmpout/customOutput/out1/_SUCCESS
注意自定义文件输出和reducetask区别,不要混淆:
自定义文件输出结果到两个文件中,这个是可以任意指定的
reducetask的最终输出文件,取决于分区
来源:CSDN
作者:霁泽Coding
链接:https://blog.csdn.net/jiajane/article/details/103702080