ncbi | 易学教程

Download all NCBI PubMed IDs based on a tag

阅读更多关于 Download all NCBI PubMed IDs based on a tag

问题 I am able to read in a PubMed ID of a paper, and return a set of records about that paper using this code: from Bio import Entrez from Bio import Medline Entrez.email = "Your.Name.Here@example.org" pubmed_rec = Entrez.efetch(db='pubmed',id=19053980,retmode='text',rettype='medline') records = Medline.parse(pubmed_rec) for rec in records: print(rec) The output is: {'PMID': '19053980', 'OWN': 'NLM', 'STAT': 'MEDLINE', 'DCOM': '20090706', 'LR': '20091015', 'IS': '1365-2036 (Electronic) 0269-2813

extract xml from xml embebed in html

阅读更多关于 extract xml from xml embebed in html

问题 im trying to get the xml presented here http://www.ncbi.nlm.nih.gov/sra/ERX086768?report=FullXml but its a bit tricky cause they dont give any suport for it. The purpose is to get the xml to php in order to go trought the xml. can someone give a hint? 回答1: It's not really true that XML presented via HTML therein wouldn't be XML as well. What you're looking for is something called textContent in DOMDocument. That will give you only the text from that HMTL. Like it is displayed "as text" in the

How do I get gene features in FASTA nucleotide format from NCBI using Perl?

阅读更多关于 How do I get gene features in FASTA nucleotide format from NCBI using Perl?

问题 I am able to download a FASTA file manually that looks like: >lcl|CR543861.1_gene_1... ATGCTTTGGACA... >lcl|CR543861.1_gene_2... GTGCGACTAAAA... by clicking "Send to" and selecting "Gene Features", FASTA Nucleotide is the only option (which is fine because that's all I want) on this page. With a script like this: #!/usr/bin/env perl use strict; use warnings; use Bio::DB::EUtilities; my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nucleotide', -id => 'CR543861', -rettype =>

Paste some elements of mixed vector

阅读更多关于 Paste some elements of mixed vector

问题 I have a vector with terms that may be followed by zero or more qualifiers starting with "/". The first element should always be a term. mesh <- c("Animals", "/physiology" , "/metabolism*", "Insects", "Arabidopsis", "/immunology" ) I'd like to join the qualifier with the last term and get a new vector Animals/physiology Animals/metabolism* Insects Arabidopsis/immunology 回答1: Make a group identifier by grepl ing for values not starting with a / , split on this group identifier, then paste0 :

PHP Simplexml_Load_File fails

阅读更多关于 PHP Simplexml_Load_File fails

问题 I have successfully been able to get a pubmed results page in xml format and write the contents to a local file "Publications.xml". The problem is when I use simplexml_load_file("Publications.xml"), it fails. Not able to figure out why. <?php $feed = 'http://www.ncbi.nlm.nih.gov/pubmed?term=carl&sort=pubdate&report=xml'; $local = 'Publications.xml'; $curtime = time(); $filemodtime; if( (!file_exists($local)) || (time() - filemtime($local)) > 86400 ) { $contents = file_get_contents($feed); $fp

urllib2.HTTPError Python

阅读更多关于 urllib2.HTTPError Python

问题 I have a file with GI numbers and would like to get FASTA sequences from ncbi. from Bio import Entrez import time Entrez.email ="eigtw59tyjrt403@gmail.com" f = open("C:\\bioinformatics\\gilist.txt") for line in iter(f): handle = Entrez.efetch(db="nucleotide", id=line, retmode="xml") records = Entrez.read(handle) print ">GI "+line.rstrip()+" "+records[0]["GBSeq_primary-accession"]+" "+records[0]["GBSeq_definition"]+"\n"+records[0]["GBSeq_sequence"] time.sleep(1) # to make sure not many

The new RefSeq release from NCBI is compatible with Bio.Entrez.Parser?

阅读更多关于 The new RefSeq release from NCBI is compatible with Bio.Entrez.Parser?

问题 I'm new with python and especially with Biopython. I'm trying to take some information from an XML file with Entrez.efetch and then read it. Last week this script worked well: handle = Entrez.efetch(db="Protein", id="YP_008872780.1", retmode="xml") records = Entrez.read(handle) But now I'm getting an Error: > Bio.Entrez.Parser.ValidationError: Failed to find tag 'GBSeq_xrefs' in the DTD. To skip all tags that are not represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse with

PHP Simplexml_Load_File fails

阅读更多关于 PHP Simplexml_Load_File fails

I have successfully been able to get a pubmed results page in xml format and write the contents to a local file "Publications.xml". The problem is when I use simplexml_load_file("Publications.xml"), it fails. Not able to figure out why. <?php $feed = 'http://www.ncbi.nlm.nih.gov/pubmed?term=carl&sort=pubdate&report=xml'; $local = 'Publications.xml'; $curtime = time(); $filemodtime; if( (!file_exists($local)) || (time() - filemtime($local)) > 86400 ) { $contents = file_get_contents($feed); $fp = fopen($local,"w"); fwrite($fp, $contents); fclose($fp); } $xml = simplexml_load_file($local) or (

The new RefSeq release from NCBI is compatible with Bio.Entrez.Parser?

阅读更多关于 The new RefSeq release from NCBI is compatible with Bio.Entrez.Parser?

I'm new with python and especially with Biopython. I'm trying to take some information from an XML file with Entrez.efetch and then read it. Last week this script worked well: handle = Entrez.efetch(db="Protein", id="YP_008872780.1", retmode="xml") records = Entrez.read(handle) But now I'm getting an Error: > Bio.Entrez.Parser.ValidationError: Failed to find tag 'GBSeq_xrefs' in the DTD. To skip all tags that are not represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse with validate=False. So I run this: records = Entrez.read(handle, validate=False) But I'm still getting an

How can I get taxonomic rank names from taxid?

阅读更多关于 How can I get taxonomic rank names from taxid?

问题 This question is related to: How to get taxonomic specific ids for kingdom, phylum, class, order, family, genus and species from taxid? The solution given there works but I would like to have the names for each taxonomic ids for defined ranks. I have found this on ete3 which can do the job: names = ncbi.get_taxid_translator(lineage) print [names[taxid] for taxid in lineage] but not being python programmer, I am failing to incorporate this into the code given in the link above. Here is what I