问题
I need a awk script that searches for any string inside <>, if it finds one that it hasn't found before it should replace it with the current value of the index counter (0 at the beginning) and increment the counter. If it finds a string inside <> that it already knows, it should look up the index of the string and replace it with the index. This should be done across multiple files, meaning the counter does not reset when multiple files are searched for the patterns, only at program startup For example: file_a.txt:
123abc<abc>xyz
efg
<b>ah
a<c>, <abc>
<c>b
(<abc>, <b>)
file_b.txt:
xyz(<c>, <b>)
xyz<b>xy<abc>z
should become
file_a_new.txt:
123abc<0>xyz
efg
<1>ah
a<2>, <0>
<2>b
(<0>, <1>)
file_b_new.txt:
xyz(<2>, <1>)
xyz<1>xy<0>z
What I got so far:
awk 'match($0, /<[^>]+>/) {
k = substr($0, RSTART, RLENGTH)
if (!(k in freq))
freq[k] = n++
$0 = substr($0, 1, RSTART-1) freq[k] substr($0, RSTART+RLENGTH)
}
{
print $0 > (FILENAME ".tmp")
}' files
But this can only detect one <> pattern per line, but there can be multiple <> patterns per line. So how should I change the code?
Edit: The files should not be editet, instead new files should be created
回答1:
Using gnu-awk it is easier this way using RS as <key> string:
awk -v RS='<[^>]+>' '{ ORS="" } # init ORS to ""
RT { # when RT is set
if (!(RT in freq)) # if RT is not in freq array
freq[RT] = n++ # save n in freq & increment n
ORS="<" freq[RT] ">" # set ORS to < + n + >
}
{
print $0 > ("/tmp/" FILENAME)
}' file_{a,b}.txt
回答2:
Using any awk:
$ cat tst.awk
FNR == 1 {
close(out)
out = FILENAME ".tmp"
}
{
head = ""
tail = $0
while ( match(tail,/<[^>]+>/) ) {
tgt = substr(tail,RSTART+1,RLENGTH-2)
if ( !(tgt in map) ) {
map[tgt] = cnt++
}
head = head substr(tail,1,RSTART) map[tgt]
tail = substr(tail,RSTART+RLENGTH-1)
}
print head tail > out
}
$ head file_*.tmp
==> file_a.txt.tmp <==
123abc<0>xyz
efg
<1>ah
a<2>, <0>
<2>b
(<0>, <1>)
==> file_b.txt.tmp <==
xyz(<2>, <1>)
xyz<1>xy<0>z
来源:https://stackoverflow.com/questions/65024964/awk-script-for-replacing-multiple-occurances-of-string-pattern-in-the-same-line