AWK — How to do selective multiple column sorting?

佐手、 提交于 2019-12-04 09:10:53

This could be done in multiple steps with the help of paste:

$ gawk '{print $1, $2, $3}' in.txt > a.txt
$ gawk '{print $4, $5, $6}' in.txt | sort -k 2 -n b.txt > b.txt
$ paste -d' ' a.txt b.txt
1 a f 2 10 w
2 b g 5 11 z
3 c h 1 12 v
4 d i 4 15 y
5 e j 3 19 x

Personally, I find using awk to safely sort arrays of columns rather tricky because often you will need to hold and sort on duplicate keys. If you need to selectively sort a group of columns, I would call paste for some assistance:

paste -d ' ' <(awk '{ print $1, $2, $3 }' file.txt) <(awk '{ print $4, $5, $6 | "sort -k 2" }' file.txt)

Results:

1 a f 2 10 w
2 b g 5 11 z
3 c h 1 12 v
4 d i 4 15 y
5 e j 3 19 x

This can be done in pure awk, but as @steve said, it's not ideal. gawk has limited sort functions, and awk has no built-in sort at all. That said, here's a (rather hackish) solution using a compare function in gawk:

[ghoti@pc ~/tmp3]$ cat text 
1  a  f  1  12  v  
2  b  g  2  10  w  
3  c  h  3  19  x  
4  d  i  4  15  y  
5  e  j  5  11  z  
[ghoti@pc ~/tmp3]$ cat doit.gawk 
### Function to be called by asort().
function cmp(i1,v1,i2,v2) {
  split(v1,a1); split(v2,a2);
  if (a1[2]>a2[2])      { return 1; }
  else if (a1[2]<a2[2]) { return -1; }
  else                  { return 0; }
}

### Left-hand-side and right-hand-side, are sorted differently.
{
  lhs[NR]=sprintf("%s %s %s",$1,$2,$3);
  rhs[NR]=sprintf("%s %s %s",$4,$5,$6);
}

END {
  asort(rhs,sorted,"cmp");    ### This calls the function we defined, above.
  for (i=1;i<=NR;i++) {       ### Step through the arrays and reassemble.
    printf("%s %s\n",lhs[i],sorted[i]);
  }
}    
[ghoti@pc ~/tmp3]$ gawk -f doit.gawk text 
1 a f 2 10 w
2 b g 5 11 z
3 c h 1 12 v
4 d i 4 15 y
5 e j 3 19 x
[ghoti@pc ~/tmp3]$ 

This keeps your entire input file in arrays, so that lines can be reassembled after the sort. If your input is millions of lines, this may be problematic.

Note that you might want to play with the printf and sprintf functions to set appropriate output field separators.

You can find documentation on using asort() with functions in the gawk man page; look for PROCINFO["sorted_in"].

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!