发表新帖

发表新帖

Hadoop's input splitting- How does it work

前端未结

关注

 2  1426

长情又很酷 2021-01-16 20:15

I know brief about hadoop

I am curious to know how does it work.

To be precise I want to know, how exactly it divides/splits the input file.

Does it

2条回答

[愿得一人] (楼主)

2021-01-16 20:42
When you submit a map-reduce job (or pig/hive job), Hadoop first calculates the input splits, each input split size generally equals to HDFS block size. For example, for a file of 1GB size, there will be 16 input splits, if block size is 64MB. However, split size can be configured to be less/more than HDFS block size. Calculation of input splits is done with FileInputFormat. For each of these input splits, a map task must be started.

But you can change the size of input split by configuring following properties:
```
mapred.min.split.size: The minimum size chunk that map input should be split into.
mapred.max.split.size: The largest valid size inbytes for a file split. 
dfs.block.size: The default block size for new files.
```
And the formula for input split is:
```
Math.max("mapred.min.split.size", Math.min("mapred.max.split.size", blockSize));
```
You can check examples here.
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题