问题
I am using awk in a bash script to compare two files to get just the not-matching lines. I need to compare all three fields of the second file (as one pattern?) with all lines of the first file:
First file:
chr1 9997 10330 HumanGM18558_peak_1 150 . 10.78887 18.86368 15.08777 100
chr1 628885 635117 HumanGM18558_peak_2 2509 . 83.77238 255.95094 250.99944 5270
chr1 15966215 15966638 HumanGM18558_peak_3 81 . 7.61567 11.78841 8.17169 200
Second file:
chr1 628885 635117
chr1 1250086 1250413
chr1 16613629 16613934
chr1 16644496 16644800
chr1 16895871 16896489
chr1 16905126 16905616
The current idea is to load one file in an array and use AWKs negative regular expression to compare.
readarray a < file2.txt
for i in "${a[@]}"; do
awk -v var="$i" '!/var/' file1.narrowPeak | cat > output.narrowPeak
done
The problem is that '!/var/' is not working with variables.
回答1:
With awk alone:
$ awk 'NR==FNR{a[$1,$2,$3]; next} !(($1,$2,$3) in a)' file2 file1
chr1 9997 10330 HumanGM18558_peak_1 150 . 10.78887 18.86368 15.08777 100
chr1 15966215 15966638 HumanGM18558_peak_3 81 . 7.61567 11.78841 8.17169 200
NR==FNRthis will be true only for the first file, which isfile2in this examplea[$1,$2,$3]create keys based on first three fields, if spacing is exactly same between the two files, you can simply use$0instead of$1,$2,$3nextto skip remaining commands and process next line of input($1,$2,$3) in ato check if first three fields offile1is present as key in arraya. Then invert the condition.
Here's another way to write it (thanks to Ed Morton)
awk '{key=$1 FS $2 FS $3} NR==FNR{a[key]; next} !(key in a)' file2 file1
回答2:
When the pattern is stored in a variable, you have to use the match operator:
awk -v var="something" '
$0 !~ var {print "this line does not match the pattern"}
'
With this problem, regular expression matching looks a bit awkward. I'd go with Sundeep's solution, but if you really want regex:
awk '
NR == FNR {
# construct and store the regex
patt["^" $1 "[[:blank:]]+" $2 "[[:blank:]]+" $3 + "[[:blank:]]"] = 1
next
}
{
for (p in patt)
if ($0 ~ p)
next
print
}
' second first
来源:https://stackoverflow.com/questions/63074772/awk-negative-regular-expression-with-variable