Is it possible to initialize an array like this in AWK ?
Colors[1] = (\"Red\", \"Green\", \"Blue\")
Colors[2] = (\"Yellow\", \"Cyan\", \"Purple\")
If you have GNU awk
, you can use a true multidimensional array. Although this answer uses the split()
function, it most certainly doesn't abuse it. Run like:
awk -f script.awk
Contents of script.awk
:
BEGIN {
x=SUBSEP
a="Red" x "Green" x "Blue"
b="Yellow" x "Cyan" x "Purple"
Colors[1][0] = ""
Colors[2][0] = ""
split(a, Colors[1], x)
split(b, Colors[2], x)
print Colors[2][3]
}
Results:
Purple
The existing answers are helpful and together cover all aspects, but I thought I'd give a more focused summary.
The question conflates two aspects:
Awk has no array literal (initializer) syntax.
The simplest workaround is to:
split()
function to split that string into the elements of an array.$ awk 'BEGIN { n=split("Red Green Blue", arr); for (i=1;i<=n;++i) print arr[i] }'
Red
Green
Blue
This is what the OP did in their own helpful answer.
If the elements themselves contain whitespace, use a custom separator that's not part of the data, |
in this example:
$ awk 'BEGIN { n=split("Red (1)|Green (2)", arr, "|"); for (i=1;i<=n;++i) print arr[i] }'
Red (1)
Green (2)
Per POSIX, Awk has no true multi-dimensional arrays, only an emulation of it using a one-dimensional array whose indices are implicitly concatenated with the value of built-in variable SUBSEP
to form a single key (index; note that all Awk arrays are associative).
arr[1, 2]
is effectively the same as arr[1 SUBSEP 2]
, where 1 SUBSEP 2
is a string concatenation that builds the key value.for (i in ...)
, such as to get all sub-indices for primary (pseudo-)dimension 1
only.SUBSEP
is the "INFORMATION SEPARATOR ONE" character, a a rarely used control character that's unlikely to appear in date; in ASCII and UTF-8 it is represented as single byte 0x1f
; if needed, you change the value.By contrast, GNU Awk, as a nonstandard extension, does have support for true multi-dimensional arrays.
arr[1,2]
you must use arr[1][2]
.POSIX-compliant example (similar to TrueY's helpful answer):
awk 'BEGIN {
n=split("Red Green Blue", arrAux); for (i in arrAux) Colors[1,i] = arrAux[i]
n=split("Yellow Cyan Purple", arrAux); for (i in arrAux) Colors[2,i] = arrAux[i]
print Colors[1,2]
print "---"
# Enumerate all [2,*] values - see comments below.
for (i in Colors) { if (index(i, 2 SUBSEP)==1) print Colors[i] }
}'
Green
---
Yellow
Cyan
Purple
Note that the emulation of multi-dimensional arrays with a one-dimensional array using compound keys has the following inconvenient implications:
Auxiliary array auxArr
is needed, because you cannot directly populate a given (pseudo-)dimension of an array.
You cannot enumerate just one (pseudo-)dimension with for (i in ...)
, you can only enumerate all indices, across (pseudo-)dimensions.
for (i in Colors) { if (index(i, 2 SUBSEP)==1) print Colors[i] }
above shows how to work around that by enumerating all keys and then matching only the ones whose first constituent index is 2
, which means that the key value must start with 2
, followed by SUBSEP
.GNU Awk example (similar to Steve's helpful answer, improved with Ed Morton's comment):
GNU Awk's (nonstandard) support for true multi-dimensional arrays makes the inconveniences of the POSIX-compliant solution (mostly) go away
(GNU Awk also doesn't have array initializers, however):
gawk 'BEGIN {
Colors[1][""]; split("Red Green Blue", Colors[1])
Colors[2][""]; split("Yellow Cyan Purple", Colors[2])
# NOTE: Always use *separate* indices: [1][2] instead of [1,2]
print Colors[1][2]
print "---"
# Enumerate all [2][*] values
for (i in Colors[2]) print Colors[2][i]
}'
Note:
Important: As stated, to address a specific element in a multi-dimensional array, always use separate indices; e.g., [1][2]
rather than [1,2]
.
[1,2]
you'll get the standard POSIX-mandated behavior, and you'll mistakenly create a new, single index (key) with (string-concatenated) value 1 SUBSEP 2
.split()
can conveniently be used to directly populate a sub-array.
As a prerequisite, however, the 2-dimensional target arrays must be initialized:
Colors[1][""]
and Colors[2][""]
do just that.[""]
is just there to create a 2-dimensional array; it is discarded when split()
fills that dimension later.Enumerating a specific dimension with for (i in ...)
is supported:
for (i in Colors[2]) ...
conveniently enumerates only the sub-indices of Colors[2]
.You can create a 2-dimensional array easily enough. What you can't do, AFAIK, is initialize it in a single operation. As dmckee hints in a comment, one of the reasons for not being able to initialize an array is that there is no restriction on the types of the subscripts, and hence no requirement that they are pure numeric. You can do multiple assignments as in the script below. The subscripts are formally separated by an obscure character designated by the variable SUBSEP, with default value 034 (U+001C, FILE SEPARATOR). Clearly, if one of the indexes contains this character, confusion will follow (but when was the last time you used that character in a string?).
BEGIN {
Colours[1,1] = "Red"
Colours[1,2] = "Green"
Colours[1,3] = "Blue"
Colours[2,1] = "Yellow"
Colours[2,2] = "Cyan"
Colours[2,3] = "Purple"
}
END {
for (i = 1; i <= 2; i++)
for (j = 1; j <= 3; j++)
printf "Colours[%d,%d] = %s\n", i, j, Colours[i,j];
}
Example run:
$ awk -f so14063783.awk /dev/null
Colours[1,1] = Red
Colours[1,2] = Green
Colours[1,3] = Blue
Colours[2,1] = Yellow
Colours[2,2] = Cyan
Colours[2,3] = Purple
$
A similar solution. SUBSEP=":"
is not really needed, just set to any visible char for demo:
awk 'BEGIN{SUBSEP=":"
split("Red Green Blue",a); for(i in a) Colors[1,i]=a[i];
split("Yellow Cyan Purple",a); for(i in a) Colors[2,i]=a[i];
for(i in Colors) print i" => "Colors[i];}'
Or a little bit more cryptic version:
awk 'BEGIN{SUBSEP=":"
split("Red Green Blue Yellow Cyan Purple",a);
for(i in a) Colors[int((i-1)/3)+1,(i-1)%3+1]=a[i];
for(i in Colors) print i" => "Colors[i];}'
Output:
1:1 => Red
1:2 => Green
1:3 => Blue
2:1 => Yellow
2:2 => Cyan
2:3 => Purple
Thanks for the answers. Anyways, for those who want to initialize unidimensional arrays, here is an example:
SColors = "Red_Green_Blue"
split(SColors, Colors, "_")
print Colors[1] " - " Colors[2] " - " Colors[3]