I have a fasta file where the sequences are broken up with newlines. I\'d like to remove the newlines. Here\'s an example of my file:
>accession1
ATGGCC
Do not reinvent the wheel. If the goal is simply removing newlines in multi-line fasta file (unwrapping fasta file), use any of the specialized bioinformatics tools, for example seqtk, like so:
seqtk seq -l 0 input_file
Example:
# Create the input for testing:
cat > test_unwrap_in.fa <seq1 with blanks
ACGT ACGT ACGT
>seq2 with newlines
ACGT
ACGT
ACGT
>seq3 without blanks or newlines
ACGTACGTACGT
EOF
# Unwrap lines:
seqtk seq -l 0 test_unwrap_in.fa > test_unwrap_out.fa
cat test_unwrap_out.fa
Output:
>seq1 with blanks
ACGT ACGT ACGT
>seq2 with newlines
ACGTACGTACGT
>seq3 without blanks or newlines
ACGTACGTACGT
To install seqtk, you can use for example conda install seqtk.
SEE ALSO:
seqtk usage:
seqtk seq
Usage: seqtk seq [options] |
Options: ...
-l INT number of residues per line; 0 for 2^32-1 [0]