Sort associative array with AWK

喜欢而已 提交于 2019-11-29 06:53:48

问题


Here's my array (gawk script) :

myArray["peter"] = 32
myArray["bob"] = 5
myArray["john"] = 463
myArray["jack"] = 11

After sort, I need the following result :

bob    5
jack   11
peter  32
john   463

When i use "asort", indices are lost. How to sort by array value without losing indices ? (I need ordered indices based on their values)

(I need to obtain this result with awk/gawk only, not shell script, perl, etc)

If my post isn't clear enough, here is an other post explaining the same issue : http://www.experts-exchange.com/Programming/Languages/Scripting/Shell/Q_26626841.html )

Thanks in advance

Update :

Thanks to you both, but i need to sort by values, not indices (i want ordered indices according to their values).

In other terms, i need this result :

bob    5
jack   11
peter  32
john   463

not :

bob 5
jack 11
john 463
peter 32

(I agree, my example is confusing, the chosen values are pretty bad)

From the code of Catcall, I wrote a quick implementation that works, but it's rather ugly (I concatenate keys & values before sort and split during comparison). Here's what it looks like :

function qsort(A, left, right,   i, last) {
  if (left >= right)
    return
  swap(A, left, left+int((right-left+1)*rand()))
  last = left
  for (i = left+1; i <= right; i++)
    if (getPart(A[i], "value") < getPart(A[left], "value"))
      swap(A, ++last, i)
  swap(A, left, last)
  qsort(A, left, last-1)
  qsort(A, last+1, right)
}

function swap(A, i, j,   t) {
  t = A[i]; A[i] = A[j]; A[j] = t
}

function getPart(str, part) {
  if (part == "key")
    return substr(str, 1, index(str, "#")-1)
  if (part == "value")
    return substr(str, index(str, "#")+1, length(str))+0
  return
}

BEGIN {  }
      {  }
END {

  myArray["peter"] = 32
  myArray["bob"] = 5
  myArray["john"] = 463
  myArray["jack"] = 11

  for (key in myArray)
    sortvalues[j++] = key "#" myArray[key]

  qsort(sortvalues, 0, length(myArray));

  for (i = 1; i <= length(myArray); i++)
    print getPart(sortvalues[i], "key"), getPart(sortvalues[i], "value")
}

Of course I'm interested if you have something more clean...

Thanks for your time


回答1:


Edit:

Sort by values

Oh! To sort the values, it's a bit of a kludge, but you can create a temporary array using a concatenation of the values and the indices of the original array as indices in the new array. Then you can asorti() the temporary array and split the concatenated values back into indices and values. If you can't follow that convoluted description, the code is much easier to understand. It's also very short.

# right justify the integers into space-padded strings and cat the index
# to create the new index
for (i in myArray) tmpidx[sprintf("%12s", myArray[i]),i] = i
num = asorti(tmpidx)
j = 0
for (i=1; i<=num; i++) {
    split(tmpidx[i], tmp, SUBSEP)
    indices[++j] = tmp[2]  # tmp[2] is the name
}
for (i=1; i<=num; i++) print indices[i], myArray[indices[i]]

Edit 2:

If you have GAWK 4, you can traverse the array by order of values without performing an explicit sort:

#!/usr/bin/awk -f
BEGIN {
    myArray["peter"] = 32
    myArray["bob"] = 5
    myArray["john"] = 463
    myArray["jack"] = 11

    PROCINFO["sorted_in"] = "@val_num_asc"

    for (i in myArray) {
        {print i, myArray[i]}}
    }

 }

There are settings for traversing by index or value, ascending or descending and other options. You can also specify a custom function.

Previous answer:

Sort by indices

If you have an AWK, such as gawk 3.1.2 or greater, which supports asorti():

#!/usr/bin/awk -f
BEGIN {
    myArray["peter"] = 32
    myArray["bob"] = 5
    myArray["john"] = 463
    myArray["jack"] = 11

    num = asorti(myArray, indices)
    for (i=1; i<=num; i++) print indices[i], myArray[indices[i]]
}

If you don't have asorti():

#!/usr/bin/awk -f
BEGIN {
    myArray["peter"] = 32
    myArray["bob"] = 5
    myArray["john"] = 463
    myArray["jack"] = 11

    for (i in myArray) indices[++j] = i
    num = asort(indices)
    for (i=1; i<=num; i++) print i, indices[i], myArray[indices[i]]
}



回答2:


Use the Unix sort command with the pipe, keeps Awk code simple and follow Unix philosophy
Create a input file with values seperated by comma
peter,32
jack,11
john,463
bob,5

Create a sort.awk file with the code

BEGIN { FS=","; }
{
    myArray[$1]=$2;
}
END {
    for (name in myArray)
        printf ("%s,%d\n", name, myArray[name]) | "sort -t, -k2 -n"
}

Run the program, should give you the output
$ awk -f sort.awk data
bob,5
jack,11
peter,32
john,463




回答3:


PROCINFO["sorted_in"] = "@val_num_desc";

Before iterating an array, use the above statement. But, it works in awk version 4.0.1. It does not work in awk version 3.1.7.

I am not sure in which intermediate version, it got introduced.




回答4:


And the simple answer...

function sort_by_myArray(i1, v1, i2, v2) {
    return myArray[i2] < myArray[i1];
}

BEGIN {
    myArray["peter"] = 32;
    myArray["bob"] = 5;
    myArray["john"] = 463;
    myArray["jack"] = 11;
    len = length(myArray);

    asorti(myArray, k, "sort_by_myArray");

    # Print result.
    for(n = 1; n <= len; ++n) {
            print k[n], myArray[k[n]]
    }
}



回答5:


The authors of The Awk Programming Language provide a quicksort function, which is available online.

I think you'd do something like this.

END {
  for (key in myArray) {
    sortkeys[j++] = key;
  }
  qsort(sortkeys, 0, length(myArray));      # Not sure I got the args right.
  for (i = 1; i <= length(myArray); i++) {
    print sortkeys[i], myArray[sortkeys[i]];
  }
}


来源:https://stackoverflow.com/questions/5342782/sort-associative-array-with-awk

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!