Printing a sequence from a fasta file

后端 未结 4 1456
太阳男子
太阳男子 2021-01-13 20:45

I often need to find a particular sequence in a fasta file and print it. For those who don\'t know, fasta is a text file format for biological sequences (DNA, proteins, etc.

4条回答
  •  难免孤独
    2021-01-13 21:33

    Like this maybe:

    awk '/>sequence1/{p++;print;next} /^>/{p=0} p' file
    

    So, if the line starts with >sequence1, set a flag (p) to start printing, print this line and move to next. On subsequent lines, if the line starts with >, change p flag to stop printing. In general, print if the flag p is set.

    Or, improving a little on your grep solution, use this to cut off the -A (after) context:

    grep -A 999999 "sequence1" file | awk 'NR>1 && /^>/{exit} 1'
    

    So, that prints up to 999999 lines after sequence1 and pipes them into awk. Awk then looks for a > at the start of any line after line 1, and exits if it finds one. Until then, the 1 causes awk to do its standard thing, which is print the current line.

提交回复
热议问题