Hadoop: key and value are tab separated in the output file. how to do it semicolon-separated?

依然范特西╮ 提交于 2019-11-27 01:55:47

问题


I think the title is already explaining my question. I would like to change

key (tab space) value

into

key;value

in all output files the reducers are generating from the output of mappers.

I could not find good documentation on this using google. Can anyone please give a fraction of code on how to achieve this?


回答1:


Set the configuration property mapred.textoutputformat.separator to ";"




回答2:


In lack of better documentation, here's what I've collected:

    setTextOutputFormatSeparator(final Job job, final String separator){
            final Configuration conf = job.getConfiguration(); //ensure accurate config ref

            conf.set("mapred.textoutputformat.separator", separator); //Prior to Hadoop 2 (YARN)
            conf.set("mapreduce.textoutputformat.separator", separator);  //Hadoop v2+ (YARN)
            conf.set("mapreduce.output.textoutputformat.separator", separator);
            conf.set("mapreduce.output.key.field.separator", separator);
            conf.set("mapred.textoutputformat.separatorText", separator); // ?
    }



回答3:


you can use "KEY_VALUE_SEPERATOR " property of "KeyValueLineRecordReader" to specify a separator of your choice.



来源:https://stackoverflow.com/questions/11031785/hadoop-key-and-value-are-tab-separated-in-the-output-file-how-to-do-it-semicol

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!