Generating Separate Output files in Hadoop Streaming

后端 未结 3 989
一向
一向 2020-12-28 10:25

Using only a mapper (a Python script) and no reducer, how can I output a separate file with the key as the filename, for each line of output, rather than having long files o

3条回答
  •  星月不相逢
    2020-12-28 11:01

    Is it possible to replace the outputFormatClass, when using streaming? In a native Java implementation you would extend the MultipleTextOutputFormat class and modify the method that names the output file. Then define your implementation as new outputformat with JobConf's setOutputFormat method

    you should verify, if this is possible in streaming too. I donno :-/

提交回复
热议问题