AWK negative regular expression with variable

假如想象 提交于 2021-01-27 12:02:46

问题


I am using awk in a bash script to compare two files to get just the not-matching lines. I need to compare all three fields of the second file (as one pattern?) with all lines of the first file:

First file:

chr1    9997    10330   HumanGM18558_peak_1     150     .       10.78887        18.86368        15.08777        100
chr1    628885  635117  HumanGM18558_peak_2     2509    .       83.77238        255.95094       250.99944       5270
chr1    15966215        15966638        HumanGM18558_peak_3    81      .       7.61567 11.78841        8.17169 200

Second file:

chr1 628885 635117
chr1 1250086 1250413
chr1 16613629 16613934
chr1 16644496 16644800
chr1 16895871 16896489
chr1 16905126 16905616

The current idea is to load one file in an array and use AWKs negative regular expression to compare.

readarray a < file2.txt
for i in "${a[@]}"; do
awk -v var="$i" '!/var/' file1.narrowPeak | cat > output.narrowPeak
done

The problem is that '!/var/' is not working with variables.


回答1:


With awk alone:

$ awk 'NR==FNR{a[$1,$2,$3]; next} !(($1,$2,$3) in a)' file2 file1
chr1    9997    10330   HumanGM18558_peak_1     150     .       10.78887        18.86368        15.08777        100
chr1    15966215        15966638        HumanGM18558_peak_3    81      .       7.61567 11.78841        8.17169 200
  • NR==FNR this will be true only for the first file, which is file2 in this example
  • a[$1,$2,$3] create keys based on first three fields, if spacing is exactly same between the two files, you can simply use $0 instead of $1,$2,$3
  • next to skip remaining commands and process next line of input
  • ($1,$2,$3) in a to check if first three fields of file1 is present as key in array a. Then invert the condition.

Here's another way to write it (thanks to Ed Morton)

awk '{key=$1 FS $2 FS $3} NR==FNR{a[key]; next} !(key in a)' file2 file1



回答2:


When the pattern is stored in a variable, you have to use the match operator:

awk -v var="something" '
  $0 !~ var {print "this line does not match the pattern"}
'

With this problem, regular expression matching looks a bit awkward. I'd go with Sundeep's solution, but if you really want regex:

awk '
  NR == FNR {
    # construct and store the regex
    patt["^" $1 "[[:blank:]]+" $2 "[[:blank:]]+" $3 + "[[:blank:]]"] = 1
    next
  }
  {
    for (p in patt)
      if ($0 ~ p)
        next
    print
  }
' second first


来源:https://stackoverflow.com/questions/63074772/awk-negative-regular-expression-with-variable

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!