How to initialize an array of arrays in awk?

前端 未结 5 1816
心在旅途
心在旅途 2020-12-14 06:42

Is it possible to initialize an array like this in AWK ?

Colors[1] = (\"Red\", \"Green\", \"Blue\")
Colors[2] = (\"Yellow\", \"Cyan\", \"Purple\")

相关标签:
5条回答
  • 2020-12-14 07:26

    If you have GNU awk, you can use a true multidimensional array. Although this answer uses the split() function, it most certainly doesn't abuse it. Run like:

    awk -f script.awk
    

    Contents of script.awk:

    BEGIN {
    
        x=SUBSEP
    
        a="Red" x "Green" x "Blue"
        b="Yellow" x "Cyan" x "Purple"
    
        Colors[1][0] = ""
        Colors[2][0] = ""
    
        split(a, Colors[1], x)
        split(b, Colors[2], x)
    
        print Colors[2][3]
    }
    

    Results:

    Purple
    
    0 讨论(0)
  • 2020-12-14 07:32

    The existing answers are helpful and together cover all aspects, but I thought I'd give a more focused summary.

    The question conflates two aspects:

    • initializing arrays in Awk in general
    • doing so to fill a two-dimensional array in particular

    Array initialization:

    Awk has no array literal (initializer) syntax.

    The simplest workaround is to:

    • represent the array elements as a single string and
    • use the split() function to split that string into the elements of an array.
    $ awk 'BEGIN { n=split("Red Green Blue", arr); for (i=1;i<=n;++i) print arr[i] }'
    Red
    Green
    Blue
    

    This is what the OP did in their own helpful answer.

    If the elements themselves contain whitespace, use a custom separator that's not part of the data, | in this example:

    $ awk 'BEGIN { n=split("Red (1)|Green (2)", arr, "|"); for (i=1;i<=n;++i) print arr[i] }'
    Red (1)
    Green (2)
    

    Initialization of a 2-dimensional array:

    • Per POSIX, Awk has no true multi-dimensional arrays, only an emulation of it using a one-dimensional array whose indices are implicitly concatenated with the value of built-in variable SUBSEP to form a single key (index; note that all Awk arrays are associative).

      • arr[1, 2] is effectively the same as arr[1 SUBSEP 2], where 1 SUBSEP 2 is a string concatenation that builds the key value.
      • Because there aren't truly multiple dimensions - only a flat array of compound keys - you cannot enumerate the (pseudo-)dimensions individually with for (i in ...), such as to get all sub-indices for primary (pseudo-)dimension 1 only.
      • The default value of SUBSEP is the "INFORMATION SEPARATOR ONE" character, a a rarely used control character that's unlikely to appear in date; in ASCII and UTF-8 it is represented as single byte 0x1f; if needed, you change the value.
    • By contrast, GNU Awk, as a nonstandard extension, does have support for true multi-dimensional arrays.

      • Important: You must then always specify the indices separately; e.g., instead of arr[1,2] you must use arr[1][2].

    POSIX-compliant example (similar to TrueY's helpful answer):

    awk 'BEGIN {
      n=split("Red Green Blue", arrAux); for (i in arrAux) Colors[1,i] = arrAux[i]
      n=split("Yellow Cyan Purple", arrAux); for (i in arrAux) Colors[2,i] = arrAux[i]
      print Colors[1,2]
      print "---"
      # Enumerate all [2,*] values - see comments below.
      for (i in Colors) { if (index(i, 2 SUBSEP)==1) print Colors[i] }
    }'
    Green
    ---
    Yellow
    Cyan
    Purple
    

    Note that the emulation of multi-dimensional arrays with a one-dimensional array using compound keys has the following inconvenient implications:

    • Auxiliary array auxArr is needed, because you cannot directly populate a given (pseudo-)dimension of an array.

    • You cannot enumerate just one (pseudo-)dimension with for (i in ...), you can only enumerate all indices, across (pseudo-)dimensions.

      • for (i in Colors) { if (index(i, 2 SUBSEP)==1) print Colors[i] } above shows how to work around that by enumerating all keys and then matching only the ones whose first constituent index is 2, which means that the key value must start with 2, followed by SUBSEP.

    GNU Awk example (similar to Steve's helpful answer, improved with Ed Morton's comment):

    GNU Awk's (nonstandard) support for true multi-dimensional arrays makes the inconveniences of the POSIX-compliant solution (mostly) go away
    (GNU Awk also doesn't have array initializers, however):

    gawk 'BEGIN {
      Colors[1][""]; split("Red Green Blue", Colors[1])
      Colors[2][""]; split("Yellow Cyan Purple", Colors[2])
      # NOTE: Always use *separate* indices: [1][2] instead of [1,2]
      print Colors[1][2]
      print "---"
      # Enumerate all [2][*] values
      for (i in Colors[2]) print Colors[2][i]
    }'
    

    Note:

    • Important: As stated, to address a specific element in a multi-dimensional array, always use separate indices; e.g., [1][2] rather than [1,2].

      • If you use [1,2] you'll get the standard POSIX-mandated behavior, and you'll mistakenly create a new, single index (key) with (string-concatenated) value 1 SUBSEP 2.
    • split() can conveniently be used to directly populate a sub-array.

    • As a prerequisite, however, the 2-dimensional target arrays must be initialized:

      • Colors[1][""] and Colors[2][""] do just that.
      • Dummy index [""] is just there to create a 2-dimensional array; it is discarded when split() fills that dimension later.
    • Enumerating a specific dimension with for (i in ...) is supported:

      • for (i in Colors[2]) ... conveniently enumerates only the sub-indices of Colors[2].
    0 讨论(0)
  • 2020-12-14 07:40

    You can create a 2-dimensional array easily enough. What you can't do, AFAIK, is initialize it in a single operation. As dmckee hints in a comment, one of the reasons for not being able to initialize an array is that there is no restriction on the types of the subscripts, and hence no requirement that they are pure numeric. You can do multiple assignments as in the script below. The subscripts are formally separated by an obscure character designated by the variable SUBSEP, with default value 034 (U+001C, FILE SEPARATOR). Clearly, if one of the indexes contains this character, confusion will follow (but when was the last time you used that character in a string?).

    BEGIN {
        Colours[1,1] = "Red"
        Colours[1,2] = "Green"
        Colours[1,3] = "Blue"
        Colours[2,1] = "Yellow"
        Colours[2,2] = "Cyan"
        Colours[2,3] = "Purple"
    }
    END {
        for (i = 1; i <= 2; i++)
            for (j = 1; j <= 3; j++)
                printf "Colours[%d,%d] = %s\n", i, j, Colours[i,j];
    }
    

    Example run:

    $ awk -f so14063783.awk /dev/null
    Colours[1,1] = Red
    Colours[1,2] = Green
    Colours[1,3] = Blue
    Colours[2,1] = Yellow
    Colours[2,2] = Cyan
    Colours[2,3] = Purple
    $
    
    0 讨论(0)
  • 2020-12-14 07:42

    A similar solution. SUBSEP=":" is not really needed, just set to any visible char for demo:

    awk 'BEGIN{SUBSEP=":"
    split("Red Green Blue",a); for(i in a) Colors[1,i]=a[i];
    split("Yellow Cyan Purple",a); for(i in a) Colors[2,i]=a[i];
    for(i in Colors) print i" => "Colors[i];}'
    

    Or a little bit more cryptic version:

    awk 'BEGIN{SUBSEP=":"
    split("Red Green Blue Yellow Cyan Purple",a); 
    for(i in a) Colors[int((i-1)/3)+1,(i-1)%3+1]=a[i];
    for(i in Colors) print i" => "Colors[i];}'
    

    Output:

    1:1 => Red
    1:2 => Green
    1:3 => Blue
    2:1 => Yellow
    2:2 => Cyan
    2:3 => Purple
    
    0 讨论(0)
  • 2020-12-14 07:48

    Thanks for the answers. Anyways, for those who want to initialize unidimensional arrays, here is an example:

    SColors = "Red_Green_Blue"
    split(SColors, Colors, "_")
    print Colors[1] " - " Colors[2] " - " Colors[3]
    
    0 讨论(0)
提交回复
热议问题