Remove line breaks in a FASTA file

前端 未结 9 1288
予麋鹿
予麋鹿 2020-12-05 01:26

I have a fasta file where the sequences are broken up with newlines. I\'d like to remove the newlines. Here\'s an example of my file:

>accession1
ATGGCC         


        
9条回答
  •  感动是毒
    2020-12-05 01:41

    This awk program:

    % awk '!/^>/ { printf "%s", $0; n = "\n" } 
    /^>/ { print n $0; n = "" }
    END { printf "%s", n }
    ' input.fasta
    

    Will yield:

    >accession1
    ATGGCCCATGGGATCCTAGC
    >accession2
    GATATCCATGAAACGGCTTA
    

    Explanation:

    On lines that don't start with a >, print the line without a line break and store a newline character (in variable n) for later.

    On lines that do start with a >, print the stored newline character (if any) and the line. Reset n, in case this is the last line.

    End with a newline, if required.

    Note:

    By default, variables are initialized to the empty string. There is no need to explicitly "initialize" a variable in awk, which is what you would do in c and in most other traditional languages.

    --6.1.3.1 Using Variables in a Program, The GNU Awk User's Guide

提交回复
热议问题