Remove line breaks in a FASTA file

前端 未结 9 1281
予麋鹿
予麋鹿 2020-12-05 01:26

I have a fasta file where the sequences are broken up with newlines. I\'d like to remove the newlines. Here\'s an example of my file:

>accession1
ATGGCC         


        
9条回答
  •  陌清茗
    陌清茗 (楼主)
    2020-12-05 01:40

    Do not reinvent the wheel. If the goal is simply removing newlines in multi-line fasta file (unwrapping fasta file), use any of the specialized bioinformatics tools, for example seqtk, like so:

    seqtk seq -l 0 input_file
    

    Example:

    # Create the input for testing:
    
    cat > test_unwrap_in.fa <seq1 with blanks
    ACGT ACGT ACGT
    >seq2 with newlines
    ACGT
    
    ACGT
    
    ACGT
    
    >seq3 without blanks or newlines
    ACGTACGTACGT
    
    EOF
    
    # Unwrap lines:
    
    seqtk seq -l 0 test_unwrap_in.fa > test_unwrap_out.fa
    
    cat test_unwrap_out.fa
    

    Output:

    >seq1 with blanks
    ACGT ACGT ACGT
    >seq2 with newlines
    ACGTACGTACGT
    >seq3 without blanks or newlines
    ACGTACGTACGT
    

    To install seqtk, you can use for example conda install seqtk.

    SEE ALSO:

    seqtk usage:

    seqtk seq
    
    Usage:   seqtk seq [options] |
    
    Options: ...
             -l INT    number of residues per line; 0 for 2^32-1 [0]
    

提交回复
热议问题