How to change the output file name from part-00000 in reducer to inputfile name

时光毁灭记忆、已成空白 提交于 2020-01-14 04:16:06

问题


Currently I am able to implement the name change from part-00000 to a custom fileName in mapper. I am doing this by taking the inputSplit. I tried the same in reducer to rename the file but, fileSplit method is not available for reducer. So, is there a best way to rename the output of a reducer to with inputfile name. Below is how I acheived it in mapper.

@Override
    public void setup(Context con) throws IOException, InterruptedException {
        fileName = ((FileSplit) con.getInputSplit()).getPath().getName();
        fileName = fileName.substring(0,36);
        outputName = new Text(fileName);  

        final Path baseOutputPath = FileOutputFormat.getOutputPath(con);
        final Path outputFilePath = new Path(baseOutputPath, fileName);
        TextOutputFormat<IntWritable, Text> write = new TextOutputFormat<IntWritable, Text>() {
        @Override
        public Path getDefaultWorkFile(TaskAttemptContext context, String extension) throws IOException {
        return outputFilePath;

回答1:


This is what hadoop wiki says:

You can subclass the OutputFormat.java class and write your own. You can locate and browse the code of TextOutputFormat, MultipleOutputFormat.java, etc. for reference. It might be the case that you only need to do minor changes to any of the existing Output Format classes. To do that you can just subclass that class and override the methods you need to change. 

If you need to be on key and input file format, then you could create subclass of MultipleOutputFormat to control output file name.



来源:https://stackoverflow.com/questions/27488624/how-to-change-the-output-file-name-from-part-00000-in-reducer-to-inputfile-name

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!