parsing a fasta file using a generator ( python )

前端 未结 4 826
臣服心动
臣服心动 2020-11-28 14:14

I am trying to parse a large fasta file and I am encountering out of memory errors. Some suggestions to improve the data handling would be appreciated. Currently the program

4条回答
  •  眼角桃花
    2020-11-28 14:22

    Without having a great understanding of what you are doing, I would have written the code like this:

    def readFastaEntry( fp ):
        name = ""
        while True:
            line = name or f.readline()
            if not line:
                break
            seq = []
            while True:
                name = f.readline()
                if not name or name.startswith(">"):
                    break
                else:
                    seq.append(name)
            yield (line, "".join(seq))
    

    This gathers up the data after a starting line up to the next starting line. Making seq an array means that you minimize the string joining until the last possible moment. Yielding a tuple makes more sense than a list.

提交回复
热议问题