While read line, awk $line

会有一股神秘感。 提交于 2019-12-19 11:58:08

问题


I have a file that contains a list of numbers. I have a second file with various entries and several fields each.

What I want to do is to get all the lines whose 12th field is equal to the 1st number and place them in a new file, then to the second number, and so on.

I wrote a one-liner that makes sense, but I can't figure out why it won't work.

This is the list of numbers:

cat truncations_list.txt

3
318
407
412
7

The file with the entries to be sorted is:

M00970:45:000000000-A42FD:1:1101:14736:1399 TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCATCGGCNGGAGTAACTATGACTCTNTTAAGGAGGACCAATATGAACCANACNNNNNNNNNACTNTATCTAGGGTTCCCTGCACAGTATGTGNCC    79  TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCATCGGCNGGAGTAACTATGACTCTNTTAA 65  GGAGGACCAATATGAACCANACNNNNNNNNNACTNTATCTAGGGTTCCCTGCACAGTATGTGNCC   79S65M  1   81  TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCATCGGCNGGAGTAACTATGACTCTNTTAAGG   -2  318
M00970:45:000000000-A42FD:1:1101:15371:1399 TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCAACGGCNGGAGTAACTATGACTCTNTTAAGGAGTCGGTGTTCACATGCNATNNNNNNNNNCAGNCGAACTTGATGAAGAACGTCGACGTGTNGG    83  TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCAACGGCNGGAGTAACTATGACTCTNTTAAGGAG 61  TCGGTGTTCACATGCNATNNNNNNNNNCAGNCGAACTTGATGAAGAACGTCGACGTGTNGG   83S61M  1   81  TGCCCAGTGCTCTGAATGTNNNNNTGAAGAAATTCAAGTAAGCGCGGGTCAACGGCNGGAGTAACTATGACTCTNTTAAGG   2   407

This is my command:

file="truncations_list.txt"
while read line; do awk '$12==$line' R2_Output.txt >reads_$line.txt ; done <"$file"

This command will create all the files "reads_412.txt", etc, but all the files are empty.

I appreciate your help!


回答1:


Your main problem is that the awk program is in single quotes, so the "$line" variable is never expanded. The quick fix is

awk -v num=$line '$12==num' R2_Output.txt

But, don't do that. You're reading the output file once for each line in the numbers file. You can make it work by just reading through each file only one time:

awk '
    # read the list of numbers in truncations_list
    FNR == NR {
        num[$1]
        next
    }

    # process each line of the output file
    # any lines with an "unknown" $12 will be ignored
    $12 in num {
        f = "reads_" $12 ".txt"
        print >> f
    }
' truncations_list.txt R2_Output.txt



回答2:


Minimizing references to $x field variables can improve Awk performance. It mostly matters for more complex scripts, but its worth trying out this slight optimization in case you are processing large files with millions of records:

 awk 'FNR==NR {a[$1]; next} (f=$12) in a {print >f}' trunc.txt R2_Out.txt


来源:https://stackoverflow.com/questions/16327874/while-read-line-awk-line

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!