Awk array iteration for multi-dimensional arrays

后端 未结 5 1291
不思量自难忘°
不思量自难忘° 2020-12-25 13:25

Awk offers associative indexing for array processing. Elements of 1 dimensional array can be iterated:

e.g.

for(index in arr1)
  print \"arr1[\" inde         


        
相关标签:
5条回答
  • 2020-12-25 14:10

    I'll provide an example of how I use this in my work processing query data. Suppose you have an extract file full of transactions by product category and customer id:

    customer_id  category  sales
    1111         parts     100.01
    1212         parts       5.20
    2211         screws      1.33
    ...etc...
    

    Its easy to use awk to count total distinct customers with a purchase:

    awk 'NR>1 {a[$1]++} END {for (i in a) total++; print "customers: " total}' \ 
    datafile.txt
    

    However, computing the number of distinct customers with a purchase in each category suggests a two dimensional array:

    awk 'NR>1 {a[$2,$1]++} 
          END {for (i in a) {split(i,arr,SUBSEP); custs[arr[1]]++}
               for (k in custs) printf "category: %s customers:%d\n", k, custs[k]}' \
    datafile.txt
    

    The increment of custs[arr[1]]++ works because each category/customer_id pair is unique as an index to the associative array used by awk.

    In truth, I use gnu awk which is faster and can do array[i][j] as D. Williamson mentioned. But I wanted to be sure I could do this in standard awk.

    0 讨论(0)
  • 2020-12-25 14:11

    AWK fakes multidimensional arrays by concatenating the indices with the character held in the SUBSEP variable (0x1c). You can iterate through a two-dimensional array using split like this (based on an example in the info gawk file):

    awk 'BEGIN { OFS=","; array[1,2]=3; array[2,3]=5; array[3,4]=8; 
      for (comb in array) {split(comb,sep,SUBSEP);
        print sep[1], sep[2], array[sep[1],sep[2]]}}'
    

    Output:

    2,3,5
    3,4,8
    1,2,3
    

    You can, however, iterate over a numerically indexed array using nested for loops:

    for (i = 1; i <= width; i++)
        for (j = 1; j < = height; j++)
            print array[i, j]
    

    Another noteworthy bit of information from the GAWK manual:

    To test whether a particular index sequence exists in a multidimensional array, use the same operator (in) that is used for single dimensional arrays. Write the whole sequence of indices in parentheses, separated by commas, as the left operand:

         (subscript1, subscript2, ...) in array
    

    Gawk 4 adds arrays of arrays. From that link:

    for (i in array) {
        if (isarray(array[i])) {
            for (j in array[i]) {
                print array[i][j]
            }
        }
        else
            print array[i]
    }
    

    Also see Traversing Arrays of Arrays for information about the following function which walks an arbitrarily dimensioned array of arrays, including jagged ones:

    function walk_array(arr, name,      i)
    {
        for (i in arr) {
            if (isarray(arr[i]))
                walk_array(arr[i], (name "[" i "]"))
            else
                printf("%s[%s] = %s\n", name, i, arr[i])
        }
    } 
    
    0 讨论(0)
  • 2020-12-25 14:11

    awk(1) was originally designed -- in part -- to be teaching tool for the C language, and multi-dimensional arrays have been in both C and awk(1) pretty much forever. as such POSIX IEEE 1003.2 standardized them.

    To explore the syntax and semantics, if you create the following file called "test.awk":

    BEGIN {
      KEY["a"]="a";
      KEY["b"]="b";
      KEY["c"]="c";
      MULTI["a"]["test_a"]="date a";
      MULTI["b"]["test_b"]="dbte b";
      MULTI["c"]["test_c"]="dcte c";
    }
    END {
      for(k in KEY) {
        kk="test_" k ;
        print MULTI[k][kk]
      }
      for(q in MULTI) {
        print q
      }
      for(p in MULTI) {
        for( pp in MULTI[p] ) {
          print MULTI[p][pp]
        }
      }
    }
    

    and run it with this command:

    awk -f test.awk /dev/null
    

    you will get the following output:

    date a
    dbte b
    dcte c
    a
    b
    c
    date a
    dbte b
    dcte c
    

    at least on Linux Mint 18 Cinnamon 64-bit 4.4.0-21-generic #37-Ubuntu SMP

    0 讨论(0)
  • 2020-12-25 14:17

    No, the syntax

    for(index1 in arr2) for(index2 in arr2) {
        print arr2[index1][index2];
    }
    

    won't work. Awk doesn't truly support multi-dimensional arrays. What it does, if you do something like

    x[1,2] = 5;
    

    is to concatenate the two indexes (1 & 2) to make a string, separated by the value of the SUBSEP variable. If this is equal to "*", then you'd have the same effect as

    x["1*2"] = 5;
    

    The default value of SUBSEP is a non-printing character, corresponding to Ctrl+\. You can see this with the following script:

    BEGIN {
        x[1,2]=5;
        x[2,4]=7;
        for (ix in x) {
            print ix;
        }
    }
    

    Running this gives:

    % awk -f scriptfile | cat -v
    1^\2
    2^\4
    

    So, in answer to your question - how to iterate a multi-dimensional array - just use a single for(a in b) loop, but you may need some extra work to split up a into its x and y parts.

    0 讨论(0)
  • 2020-12-25 14:31

    The current versions of gawk (the gnu awk, default in linux, and possible to install everywhere you want), has real multidimensional arrays.

    for(b in a)
       for(c in a[b])
          print a[b][c], c , b
    

    See also function isarray()

    0 讨论(0)
提交回复
热议问题