AttributeError: 'str' object has no attribute 'id' using BioPython, parsing fasta

时光怂恿深爱的人放手 提交于 2020-06-28 03:21:47

问题


I am trying to use Bio and SeqIO to open a FASTA file that contains multiple sequences, edit the names of the sequences to remove a '.seq' on the end of all the names, (>SeqID20.seq should become >SeqID20), then write all the sequences to a new FASTA file, But i get the following error

AttributeError: 'str' object has no attribute 'id'

This is what I started with :

with open ('lots_of_fasta_in_file.fasta') as f:
    for seq_record in SeqIO.parse(f, 'fasta'):
        name, sequence = seq_record.id, str(seq_record.seq)
        pair = [name.replace('.seq',''), sequence]
        SeqIO.write(pair, "new.fasta", "fasta")

but i have also tried this and get the same error:

file_in ='lots_of_fasta_in_file.fasta'
file_out='new.fasta'

with open(file_out, 'w') as f_out:
    with open(file_in, 'r') as f_in:
        for seq_record in SeqIO.parse(f_in, 'fasta'):
            name, sequence = seq_record.id, str(seq_record.seq)
            # remove .seq from ID and add features
            pair = [name.replace('.seq',''), sequence]
            SeqIO.write(pair, file_out, 'fasta')

I assume I'm making some error in going from my list 'pair' to writing to a new file, but I'm not sure what to change. Any help would be appreciated!


回答1:


Your error occurs because SeqIO.write accepts a SeqRecord or a list/iterator of SeqRecords but you are feeding it just a list like [name, sequence]. Instead I suggest you just modify the SeqRecord .id and .description (note, if there is whitepace in the header line, you'll need to handle this too). Also it is most efficient (across Biopython versions) to write all the records at once, rather than calling .write each iteration:

from Bio import SeqIO

def yield_records():
    with open('lots_of_fasta_in_file.fasta') as f:
        for seq_record in SeqIO.parse(f, 'fasta'):
            seq_record.id = seq_record.description = seq_record.id.replace('.seq','')
            yield seq_record

SeqIO.write(yield_records(), 'new.fasta', 'fasta')



回答2:


Not really a solution for your code, but for your need:

sed 's/\.seq$//' lots_of_fasta_in_file.fasta > new.fasta

This script assumes a proper fasta file. It will remove all ".seq" strings at the end of any line. And in a proper fasta file, only the ID lines should contain this characters.



来源:https://stackoverflow.com/questions/51488228/attributeerror-str-object-has-no-attribute-id-using-biopython-parsing-fast

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!