Dynamic regular expressions in awk

ぃ、小莉子 提交于 2021-02-17 05:46:29

问题


I have text files like

1.txt

AA;00000;
BB;11111;
GG;22222;

2.txt

KK;WW;55555;11111;
KK;FF;ZZ;11111;
KK;RR;YY;11111;

I generate this 3.txt output

AA;00000;
BB;11111;KK;WW;55555;FF;ZZ;RR;YY
GG;22222;

with this .awk script (I use it in Windows with cmd)

#!/usr/bin/awk -f 

NR != FNR {
    exit
}
{
    printf "%s", $0
}
/^BB/ {
    o = ""
    while (getline tmp < ARGV[2]) {
        n = split (tmp,arr,";")
        for (i=1; i<=n; i++)
            if(!match($0,arr[i]) && !match(o,arr[i]))
                o=o arr[i]";"
    }
    printf "%s", o
}
{
    print ""
}

Usage is awk -f script.awk 1.txt 2.txt

Seems to be ok but consider this situation

1.txt

AA;BB;

2.txt

CC;DD;BB;AA;

now replace in this way

AA is replaced with d(2)
BB is replaced with http://a.o/f/i.p?t=1
CC is replaced with Link
DD with A_x-y.7z

script can't generate 3.txt

AA;BB;CC;DD;

or, using replaced text it can't generate this 3.txt text output

   d(2);http://a.o/f/i.p?t=1;Link;A_x-y.7z;

You can see that duplicates fields like AA , BB are removed from 3.txt output because script works in that way.

I suspect it has to do with the (...) being taken as a REGEX grouping in match() as the first parameter is a REGEX and by passing $0 and o both will be treated as "Dynamic Regular Expressions* in awk speak


回答1:


$ cat tst.awk
BEGIN { FS=OFS=";" }
{ key = $(NF-1) }
NR == FNR {
    for (i=1; i<(NF-1); i++) {
        if ( !seen[key,$i]++ ) {
            map[key] = (key in map ? map[key] OFS : "") $i
        }
    }
    next
}
{ print $0 map[key] }

$ awk -f tst.awk 2.txt 1.txt
AA;00000;
BB;11111;KK;WW;55555;FF;ZZ;RR;YY
GG;22222;

The above just uses literal strings in a hash lookup of array indices so it doesn't care what characters you have in your input. If you want your input to be treated as literal strings then don't use regexp functions or operators (e.g. match(), ~, sub()) on it, just use string functions/operators (e.g. index(), ==, substr(), in).



来源:https://stackoverflow.com/questions/64952864/dynamic-regular-expressions-in-awk

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!