Change File Split size in Hadoop

前端 未结 4 1926
陌清茗
陌清茗 2020-12-01 00:56

I have a bunch of small files in an HDFS directory. Although the volume of the files is relatively small, the amount of processing time per file is huge. Th

4条回答
  •  独厮守ぢ
    2020-12-01 01:29

    Write a custom input format which extends combinefileinputformat[has its own pros nad cons base don the hadoop distribution]. which combines the input splits into the value specified in mapred.max.split.size

提交回复
热议问题