SeqIO.parse on a fasta.gz

后端 未结 2 877
难免孤独
难免孤独 2021-01-04 19:24

New to coding. New to Pytho/biopython; this is my first question online, ever. How do I open a compressed fasta.gz file to extract info and perform calcuations in my functi

相关标签:
2条回答
  • 2021-01-04 20:05

    Are you using python3?

    This ("r" --> "rt") could solve your problem.

    import gzip
    from Bio import SeqIO
    
    with gzip.open("practicezip.fasta.gz", "rt") as handle:
        for record in SeqIO.parse(handle, "fasta"):
            print(record.id)
    
    0 讨论(0)
  • 2021-01-04 20:20

    Here is a solution if you want to handle both regular text and gzipped files:

    import gzip
    from mimetypes import guess_type
    from functools import partial
    from Bio import SeqIO
    
    input_file = 'input_file.fa.gz'
    
    encoding = guess_type(input_file)[1]  # uses file extension
    _open = partial(gzip.open, mode='rt') if encoding == 'gzip' else open
    
    with _open(input_file) as f:
        for record in SeqIO.parse(f, 'fasta'):
            print(record)
    

    Note: this relies on the file having the correct file extension, which I think is reasonable nearly all of the time (and the errors are obvious and explicit if this assumption is not met). However, read here for ways to actually check the file content rather than relying on this assumption.

    0 讨论(0)
提交回复
热议问题