Hadoop partitioner

后端未结

关注

 2  1947

野性不改 2021-02-09 14:27

I want to ask about Hadoop partitioner ,is it implemented within Mappers?. How to measure the performance of using the default hash partitioner - Is there better partitioner to

2条回答

不要未来只要你来 (楼主)

2021-02-09 14:41
Partitioner is not within Mapper.

Below is the process that happens in each Mapper -
- Each map task writes its output to a circular buffer memory (and not to disk). When the buffer reaches a threshold, a background thread starts to spill the contents to disk. [Buffer size is governed by mapreduce.task.io.sort.mb property & defaults to 100 MB and spill governed by mapreduce.io.sort.spill.percent property & defaults to 0.08 or 80%]. Before spilling to disk data is Partitioned corresponding to the reducers they will be sent to Performs in-memory sort by key within each partition
- Run combiner function on outcome of each sort (enabling less data to write & transfer, this needs to be done specifically)
- Compress (optional) [mapred.compress.map.output=true; mapred.map.output.compression.codec=CodecName]
- Writes to disk and The output file’s partitions are made available to reducers over HTTP.
Below is process that happens in each Reducer
- Now each Reducer collects all the files from each mapper, it moves into sort/merge phase(sort is already done at mapper side) which merges all the map outputs with maintaining sort ordering.
- During reduce phase reduce function is invoked for each key in the sorted output.
Below is the code, illustrating the actual process of partition of keys. getpartition() will return the partition number/reducer the particular key has to be sent to based on its hash code. Hashcode has to unique for each key and across the landscape Hashcode should be unique and same for a key. For this purpose hadoop implements its own Hashcode for its key instead of using java default hash code.
```
 Partition keys by their hashCode(). 

        public class HashPartitioner extends Partitioner {
        public int getPartition(K key, V value,
                                 int numReduceTasks) {
           return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
       }

       }
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...