The data looks like this, first field is a number,
3 ...
1 ...
2 ...
11 ...
And I want to sort these lines according to the first field num
For streaming with order Hadoop (which may use -jobconf
instead of -D
for configuration), you can sort by key:
-jobconf stream.num.map.output.key.fields=2\
-jobconf mapreduce.partition.keycomparator.options="-k2,2nr"\
-jobconf mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
By stream.num.map.output.key.fields
, 1st and 2nd columns are key 1
and key 2
.
mapreduce.partition.keycomparator.options="-k2,2nr"
means sorting in reverse order by using 2nd key (from 2nd to 2nd keys) as numeric value.
It is pretty much like Linux sort
command!