Awk array iteration for multi-dimensional arrays

后端未结

关注

 5  1300

Awk offers associative indexing for array processing. Elements of 1 dimensional array can be iterated:

e.g.

for(index in arr1)
  print \"arr1[\" inde


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  走了就别回头了        
                
              
                            
                2020-12-25 14:10
              
            
            
                                                                       
I'll provide an example of how I use this in my work processing query data. Suppose you have an extract file full of transactions by product category and customer id:

customer_id  category  sales
1111         parts     100.01
1212         parts       5.20
2211         screws      1.33
...etc...


Its easy to use awk to count total distinct customers with a purchase:

awk 'NR>1 {a[$1]++} END {for (i in a) total++; print "customers: " total}' \ 
datafile.txt


However, computing the number of distinct customers with a purchase in each category suggests a two dimensional array:

awk 'NR>1 {a[$2,$1]++} 
      END {for (i in a) {split(i,arr,SUBSEP); custs[arr[1]]++}
           for (k in custs) printf "category: %s customers:%d\n", k, custs[k]}' \
datafile.txt


The increment of custs[arr[1]]++ works because each category/customer_id pair is unique as an index to the associative array used by awk.

In truth, I use gnu awk which is faster and can do array[i][j] as D. Williamson mentioned. But I wanted to be sure I could do this in standard awk.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光取名叫无心        
                
              
                            
                2020-12-25 14:11
              
            
            
                                                                       
AWK fakes multidimensional arrays by concatenating the indices with the character held in the SUBSEP variable (0x1c). You can iterate through a two-dimensional array using split like this (based on an example in the info gawk file):

awk 'BEGIN { OFS=","; array[1,2]=3; array[2,3]=5; array[3,4]=8; 
  for (comb in array) {split(comb,sep,SUBSEP);
    print sep[1], sep[2], array[sep[1],sep[2]]}}'


Output:

2,3,5
3,4,8
1,2,3


You can, however, iterate over a numerically indexed array using nested for loops:

for (i = 1; i <= width; i++)
    for (j = 1; j < = height; j++)
        print array[i, j]


Another noteworthy bit of information from the GAWK manual:


  To test whether a particular index sequence exists in a multidimensional array, use the same operator (in) that is used for single dimensional arrays. Write the whole sequence of indices in parentheses, separated by commas, as the left operand:

     (subscript1, subscript2, ...) in array



Gawk 4 adds arrays of arrays. From that link:

for (i in array) {
    if (isarray(array[i])) {
        for (j in array[i]) {
            print array[i][j]
        }
    }
    else
        print array[i]
}


Also see Traversing Arrays of Arrays for information about the following function which walks an arbitrarily dimensioned array of arrays, including jagged ones:

function walk_array(arr, name,      i)
{
    for (i in arr) {
        if (isarray(arr[i]))
            walk_array(arr[i], (name "[" i "]"))
        else
            printf("%s[%s] = %s\n", name, i, arr[i])
    }
} 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  陌清茗        
                
              
                            
                2020-12-25 14:11
              
            
            
                                                                       
awk(1) was originally designed -- in part -- to be teaching tool for the C language, and multi-dimensional arrays have been in both C and awk(1) pretty much forever. as such POSIX IEEE 1003.2 standardized them. 

To explore the syntax and semantics, if you create the following file called "test.awk":

BEGIN {
  KEY["a"]="a";
  KEY["b"]="b";
  KEY["c"]="c";
  MULTI["a"]["test_a"]="date a";
  MULTI["b"]["test_b"]="dbte b";
  MULTI["c"]["test_c"]="dcte c";
}
END {
  for(k in KEY) {
    kk="test_" k ;
    print MULTI[k][kk]
  }
  for(q in MULTI) {
    print q
  }
  for(p in MULTI) {
    for( pp in MULTI[p] ) {
      print MULTI[p][pp]
    }
  }
}


and run it with this command: 

awk -f test.awk /dev/null


you will get the following output: 

date a
dbte b
dcte c
a
b
c
date a
dbte b
dcte c


at least on Linux Mint 18 Cinnamon 64-bit 4.4.0-21-generic #37-Ubuntu SMP
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  萌比男神i        
                
              
                            
                2020-12-25 14:17
              
            
            
                                                                       
No, the syntax

for(index1 in arr2) for(index2 in arr2) {
    print arr2[index1][index2];
}


won't work. Awk doesn't truly support multi-dimensional arrays. What it does, if you do something like

x[1,2] = 5;


is to concatenate the two indexes (1 & 2) to make a string, separated by the value of the SUBSEP variable. If this is equal to "*", then you'd have the same effect as

x["1*2"] = 5;


The default value of SUBSEP is a non-printing character, corresponding to Ctrl+\. You can see this with the following script:

BEGIN {
    x[1,2]=5;
    x[2,4]=7;
    for (ix in x) {
        print ix;
    }
}


Running this gives:

% awk -f scriptfile | cat -v
1^\2
2^\4


So, in answer to your question - how to iterate a multi-dimensional array - just use a single for(a in b) loop, but you may need some extra work to split up a into its x and y parts.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  半阙折子戏        
                
              
                            
                2020-12-25 14:31
              
            
            
                                                                       
The current versions of gawk (the gnu awk, default in
linux, and possible to install everywhere you want), has real multidimensional arrays.

for(b in a)
   for(c in a[b])
      print a[b][c], c , b


See also function isarray()
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复