Behavior of the parameter “mapred.min.split.size” in HDFS

前端 未结 2 603
轮回少年
轮回少年 2020-12-02 18:05

The parameter \"mapred.min.split.size\" changes the size of the block in which the file was written earlier? Assuming a situation where I, when starting my JOB, pass the par

2条回答
  •  天涯浪人
    2020-12-02 18:35

    The split size is calculated by the formula:-

    max(mapred.min.split.size, min(mapred.max.split.size, dfs.block.size))
    

    In your case it will be:-

    split size=max(128,min(Long.MAX_VALUE(default),64))
    

    So above inference:-

    1. each map will process 2 hdfs blocks(assuming each block 64MB): True

    2. There will be a new division of my input file (previously included HDFS) to occupy blocks in HDFS 128M: False

    but making the minimum split size greater than the block size increases the split size, but at the cost of locality.

提交回复
热议问题