发表新帖

发表新帖

how to sort numerically in hadoop's shuffle/sort phase?

前端未结

关注

 3  1629

醉话见心 2020-12-13 16:05

The data looks like this, first field is a number,

3 ...
1 ...
2 ...
11 ...

And I want to sort these lines according to the first field num

3条回答

南方客 (楼主)

2020-12-13 16:29
For streaming with order Hadoop (which may use -jobconf instead of -D for configuration), you can sort by key:
```
-jobconf stream.num.map.output.key.fields=2\
-jobconf mapreduce.partition.keycomparator.options="-k2,2nr"\
-jobconf mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
```
By stream.num.map.output.key.fields, 1st and 2nd columns are key 1 and key 2.

mapreduce.partition.keycomparator.options="-k2,2nr" means sorting in reverse order by using 2nd key (from 2nd to 2nd keys) as numeric value.

It is pretty much like Linux sort command!
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题