hadoop-streaming.jar adds x'09' at the end of each line

偶尔善良 提交于 2019-12-11 06:35:15

问题


I am trying to merge some *_0 (part files in HDFS) files in a HDFS location using the below hadoop-streaming.jar command.

  hadoop jar $HDPHOME/hadoop-streaming.jar -Dmapred.reduce.tasks=1 -input $INDIR -output $OUTTMP/${OUTFILE}  -mapper cat -reducer cat

Things work fine - Except that, I get into problems, as, the result from above command seem to add x'09' to the end of each line.

We have Hive tables defined on top of the part files (which are replaced with the merged file) where the last field is defined as BIGINT. Since, the merged file adds the x'09' to the last field - the same definition of the tbale now shows NULL in the last field in Hue (as 510408 is no longer a number as X'09' is added to it).

e.g.

Data in part file.

00000320  7c 35 31 30 34 30 38 0a                           ||510408.|

Data in merged file (result of above command)

00000320  7c 35 31 30 34 30 38 09  0a                       ||510408..|

How do I avoid this from happening? Is there some option that I can set in the command to prevent this?

Appreciate your time for any help/pointers.


回答1:


I found the answerin this post -

Adding the below option seems to resolve it.

-D mapred.textoutputformat.separator=<delimiter-of-input-file>


来源:https://stackoverflow.com/questions/45620256/hadoop-streaming-jar-adds-x09-at-the-end-of-each-line

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!