AWK — How to do selective multiple column sorting?

In awk, how can I do this:

Input:

1  a  f  1  12  v  
2  b  g  2  10  w  
3  c  h  3  19  x  
4  d  i  4  15  y  
5  e  j  5  11  z

Desired output, by sorting numerical value at $5:

1  a  f  2  10  w  
2  b  g  5  11  z  
3  c  h  1  12  v  
4  d  i  4  15  y  
5  e  j  3  19  x

Note that the sorting should only affecting $4, $5, and $6 (based on value of $5), in which the previous part of table remains intact.

This could be done in multiple steps with the help of paste:

$ gawk '{print $1, $2, $3}' in.txt > a.txt
$ gawk '{print $4, $5, $6}' in.txt | sort -k 2 -n b.txt > b.txt
$ paste -d' ' a.txt b.txt
1 a f 2 10 w
2 b g 5 11 z
3 c h 1 12 v
4 d i 4 15 y
5 e j 3 19 x

Personally, I find using awk to safely sort arrays of columns rather tricky because often you will need to hold and sort on duplicate keys. If you need to selectively sort a group of columns, I would call paste for some assistance:

paste -d ' ' <(awk '{ print $1, $2, $3 }' file.txt) <(awk '{ print $4, $5, $6 | "sort -k 2" }' file.txt)

Results:

1 a f 2 10 w
2 b g 5 11 z
3 c h 1 12 v
4 d i 4 15 y
5 e j 3 19 x

This can be done in pure awk, but as @steve said, it's not ideal. gawk has limited sort functions, and awk has no built-in sort at all. That said, here's a (rather hackish) solution using a compare function in gawk:

[ghoti@pc ~/tmp3]$ cat text 
1  a  f  1  12  v  
2  b  g  2  10  w  
3  c  h  3  19  x  
4  d  i  4  15  y  
5  e  j  5  11  z  
[ghoti@pc ~/tmp3]$ cat doit.gawk 
### Function to be called by asort().
function cmp(i1,v1,i2,v2) {
  split(v1,a1); split(v2,a2);
  if (a1[2]>a2[2])      { return 1; }
  else if (a1[2]<a2[2]) { return -1; }
  else                  { return 0; }
}

### Left-hand-side and right-hand-side, are sorted differently.
{
  lhs[NR]=sprintf("%s %s %s",$1,$2,$3);
  rhs[NR]=sprintf("%s %s %s",$4,$5,$6);
}

END {
  asort(rhs,sorted,"cmp");    ### This calls the function we defined, above.
  for (i=1;i<=NR;i++) {       ### Step through the arrays and reassemble.
    printf("%s %s\n",lhs[i],sorted[i]);
  }
}    
[ghoti@pc ~/tmp3]$ gawk -f doit.gawk text 
1 a f 2 10 w
2 b g 5 11 z
3 c h 1 12 v
4 d i 4 15 y
5 e j 3 19 x
[ghoti@pc ~/tmp3]$

This keeps your entire input file in arrays, so that lines can be reassembled after the sort. If your input is millions of lines, this may be problematic.

Note that you might want to play with the printf and sprintf functions to set appropriate output field separators.

You can find documentation on using asort() with functions in the gawk man page; look for PROCINFO["sorted_in"].

来源：https://stackoverflow.com/questions/12678278/awk-how-to-do-selective-multiple-column-sorting

标签

sorting

awk

multiple-columns