extract sequences from multifasta file by ID in file using awk

前端 未结 2 1804
长发绾君心
长发绾君心 2020-12-06 18:34

I would like to extract sequences from the multifasta file that match the IDs given by separate list of IDs.

FASTA file seq.fasta:

>7P58X:01332:11         


        
相关标签:
2条回答
  • 2020-12-06 19:09
    $ awk -F'>' 'NR==FNR{ids[$0]; next} NF>1{f=($2 in ids)} f' id.txt seq.fasta
    >7P58X:01332:11636
    TTCAGCAAGCCGAGTCCTGCGTCGTTACTTCGCTT
    CAAGTCCCTGTTCGGGCGCC
    >7P58X:01334:11613
    ACGAGTGCGTCAGACCCTTTTAGTCAGTGTGGAAAC
    
    0 讨论(0)
  • 2020-12-06 19:09

    Following awk may help you on same.

    awk 'FNR==NR{a[$0];next} /^>/{val=$0;sub(/^>/,"",val);flag=val in a?1:0} flag' ids.txt  fasta_file
    
    0 讨论(0)
提交回复
热议问题