Processing large set of small files with Hadoop

后端未结

关注

 5  931

失恋的感觉 2021-01-01 00:17

I am using Hadoop example program WordCount to process large set of small files/web pages (cca. 2-3 kB). Since this is far away from optimal file size for hadoop files, the

5条回答

萌比男神i (楼主)

2021-01-01 00:50

CombineFileInputFormat can be used in this case which works well for large numaber of small files. This packs many of such files in a single split thus each mapper has more to process (1 split = 1 map task). The overall processing time for mapreduce will also will also fall since there are lesser number of mappers running. Since ther are no archive-aware InputFormat using CombineFileInputFormat will improve performance.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...