biopython

Protein sequence from uniprot protein id python

假装没事ソ 提交于 2020-12-03 17:59:38
问题 I was wondering if there is way to get the sequence of proteins from uniprot protein ids. I did check few online softwares but they allow to get one sequence at a time but I have 5536 vlues. Is there any package in biopython to do this? 回答1: All the sequences from uniprot can be accesed from "http://www.uniprot.org/uniprot/" + UniprotID +.fasta. You can obtain any sequence with import requests as r from Bio import SeqIO from io import StringIO cID='P04637' baseUrl="http://www.uniprot.org

AttributeError: 'str' object has no attribute 'id' using BioPython, parsing fasta

时光怂恿深爱的人放手 提交于 2020-06-28 03:21:47
问题 I am trying to use Bio and SeqIO to open a FASTA file that contains multiple sequences, edit the names of the sequences to remove a '.seq' on the end of all the names, (>SeqID20.seq should become >SeqID20), then write all the sequences to a new FASTA file, But i get the following error AttributeError: 'str' object has no attribute 'id' This is what I started with : with open ('lots_of_fasta_in_file.fasta') as f: for seq_record in SeqIO.parse(f, 'fasta'): name, sequence = seq_record.id, str

biopython no module named Bio

a 夏天 提交于 2020-06-27 06:49:11
问题 FYI: this is NOT a duplicate! Before running my python code I installed biopython in the cmd prompt: pip install biopython I then get an error saying 'No module named Bio' when try to import it in python import Bio The same thing happens with import biopython It should be noted I have updated PIP and run python 3.5.2 I appreciate anyone's help. 回答1: use this: pip3 install biopython and then import Bio worked for me 回答2: When I came across this problem I noticed that after I installed

Python: How to encode DNA sequence using binary values?

早过忘川 提交于 2020-05-18 18:38:46
问题 I would like to convert a file that contained few DNA sequences into binary values which is as follow: A=1000 C=0100 G=0010 T=0001 FileA.txt CCGAT GCTTA Desired output 01000100001010000001 00100100000100011000 I have tried using this code to solve my problem but the bin output file seem failed to output my desired answer. Can anyone help me? Code import sys if len(sys.argv) != 2 : sys.stderr.write('Usage: {} <nucleotide file>\n'.format(sys.argv[0])) sys.exit() # assumes the file only contains

Remove heteroatoms from PDB

让人想犯罪 __ 提交于 2020-05-01 06:09:26
问题 The heteroatoms from pdb file has to be removed. Here is the code but it did not work with my test PDB 1C4R. for model in structure: for chain in model: for reisdue in chain: id = residue.id if id[0] != ' ': chain.detach_child(id) if len(chain) == 0: model.detach_child(chain.id) Any suggestion? 回答1: The heteroatoms shouldn't be part of the chain. But you can know if a residue is a heteroatom with: pdb = PDBParser().get_structure("1C4R", "1C4R.pdb") for residue in pdb.get_residues(): tags =

Searching on pubmed using biopython

北慕城南 提交于 2020-01-17 14:02:27
问题 I am trying to input over 200 entries into pubmed in order to record the number of articles published by an author and to refine the search by including his/her mentor and institution. I have tried to do this using biopython and xlrd (the code is below), but I am consistently getting 0 results for all three formats of inquiries (1. by name, 2. by name and institution name, and 3. by name and mentor's name). Are there steps of troubleshooting that I can do, or should I use a different format

Parsing GenBank file

生来就可爱ヽ(ⅴ<●) 提交于 2020-01-15 10:58:05
问题 Basically, a GenBank file consists on gene entries (announced by 'gene' followed by its corresponding 'CDS' entry (only one per gene) like the two I show here below. I would like to get locus_tag vs product in a tab-delimited two column file. 'gene' and 'CDS' are always preceded and followed by spaces. If this task can be easily performed using an already available tool, please let me know. Input file: gene complement(8972..9094) /locus_tag="HAPS_0004" /db_xref="GeneID:7278619" CDS complement