awk sort multidimensional array [duplicate]

放肆的年华 提交于 2021-02-07 08:54:33

问题


GNU awk supports multidimensional arrays:

q[1][1] = "dog"
q[1][2] = 999
q[2][1] = "mouse"
q[2][2] = 777
q[3][1] = "bird"
q[3][2] = 888

I would like to sort the "second column" of q such that I am left with:

q[1][1] = "mouse"
q[1][2] = 777
q[2][1] = "bird"
q[2][2] = 888
q[3][1] = "dog"
q[3][2] = 999

as you can see the "first column" values moved to keep with the second. I see GNU Awk offers an asort function but it does not appear to support multidimensional arrays. If it helps, this is a working Ruby example:

q = [["dog", 999], ["mouse", 777], ["bird", 888]]
q.sort_by{|z|z[1]}
=> [["mouse", 777], ["bird", 888], ["dog", 999]]

I ended up using a regular array, then separating duplicates with newlines:

q[777] = "mouse"
q[999] = "dog" RS "fish"
q[888] = "bird"
for (z in q) {
  print q[z]
}

回答1:


FWIW, here's a workaround "sort_by()" function:

$ cat tst.awk
BEGIN {
    a[1][1] = "dog"
    a[1][2] = 999
    a[2][1] = "mouse"
    a[2][2] = 777
    a[3][1] = "bird"
    a[3][2] = 888

    print "\n############################\nBefore:"
    for (i=1; i in a; i++)
        for (j=1; j in a[i]; j++)
            printf "a[%d][%d] = %s\n",i,j,a[i][j]
    print "############################"

    sort_by(a,2)

    print "\n############################\nAfter:"
    for (i=1; i in a; i++)
        for (j=1; j in a[i]; j++)
            printf "a[%d][%d] = %s\n",i,j,a[i][j]
    print "############################"

}

function sort_by(arr,key,       keys,vals,i,j)
{
    for (i=1; i in arr; i++) {
        keys[i] = arr[i][key]
        for (j=1; j in arr[i]; j++)
            vals[keys[i]] = vals[keys[i]] (j==1?"":SUBSEP) arr[i][j]
    }

    asort(keys)

    for (i=1; i in keys; i++)
       split(vals[keys[i]],arr[i],SUBSEP)

    return (i - 1)
}

$ gawk -f tst.awk

############################
Before:
a[1][1] = dog
a[1][2] = 999
a[2][1] = mouse
a[2][2] = 777
a[3][1] = bird
a[3][2] = 888
############################

############################
After:
a[1][1] = mouse
a[1][2] = 777
a[2][1] = bird
a[2][2] = 888
a[3][1] = dog
a[3][2] = 999
############################

It works by first converting this:

    a[1][1] = "dog"
    a[1][2] = 999
    a[2][1] = "mouse"
    a[2][2] = 777
    a[3][1] = "bird"
    a[3][2] = 888

to this:

    keys[1]   = 999
    vals[999] = dog SUBSEP 999

    keys[2]   = 777
    vals[777] = mouse SUBSEP 777

    keys[3]   = 888
    vals[888] = bird SUBSEP 888

then asort()ing keys[] to get:

    keys[1] = 777
    keys[2] = 888
    keys[3] = 999

and then looping through the keys array using it's elements as the indices to the vals array for re-populating the original array.

In case anyone's wondering why I didn't just use the values we want to sort on as indices and then do an asorti() as that would have resulted in slightly briefer code, here's why:

$ cat tst.awk
BEGIN {
   a[1] = 888
   a[2] = 9
   a[3] = 777

   b[888]
   b[9]
   b[777]

   print "\n\"a[]\" sorted by content:"
   asort(a,A)
   for (i=1; i in A; i++)
      print "\t" A[i]

   print "\n\"b[]\" sorted by index:"
   asorti(b,B)
   for (i=1; i in B; i++)
      print "\t" B[i]

}
$ awk -f tst.awk

"a[]" sorted by content:
        9
        777
        888

"b[]" sorted by index:
        777
        888
        9

Notice that asorti() treats "9" as a higher value than "888". That's because asorti() sorts on array indices and all array indices are strings (even if they look like numbers) and alphabetically the first character of the string "9" IS higher than the first character of the string "888". asort() on the other hand sorts on the contents of the array, and array contents can be strings OR numbers and so normal awk comparison rules apply - anything that looks like a number is treated like a number and the number 9 is less than the number 888 which in this case IMHO is the desired result.




回答2:


supports true multidimensional arrays

No, it doesn't. It supports arrays of arrays, and it supports a hash indexed by a string consisting of two indices smushed together. Your syntax is the former (arrays of arrays).

That said, I don't think you can do it with builtins, since it would either require a use of a comparator callback, or alternately an ability to return a sort permutation, neither of which gawk provides, AFAIK.

But you can refer to this page which describes how to implement qsort for yourself, where you can change the comparison from A[i] < A[left] to A[i][2] < A[left][2].



来源:https://stackoverflow.com/questions/17692771/awk-sort-multidimensional-array

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!