I have a fasta file where the sequences are broken up with newlines. I\'d like to remove the newlines. Here\'s an example of my file:
>accession1
ATGGCC
The accepted solution is fine, but it's not particularly AWKish. Consider using this instead:
awk '/^>/ { print (NR==1 ? "" : RS) $0; next } { printf "%s", $0 } END { printf RS }' file
Explanation:
For lines beginning with >
, print the line. A ternary operator is used to print a leading newline character if the line is not the first in the file. For lines not beginning with >
, print the line without a trailing newline character. Since the last line in the file won't begin with >
, use the END
block to print a final newline character.
Note that the above can also be written more briefly, by setting a null output record separator, enabling default printing and re-assigning lines beginning with >
. Try:
awk -v ORS= '/^>/ { $0 = (NR==1 ? "" : RS) $0 RS } END { printf RS }1' file