ncbi

Download all NCBI PubMed IDs based on a tag

半世苍凉 提交于 2021-01-29 15:21:03
问题 I am able to read in a PubMed ID of a paper, and return a set of records about that paper using this code: from Bio import Entrez from Bio import Medline Entrez.email = "Your.Name.Here@example.org" pubmed_rec = Entrez.efetch(db='pubmed',id=19053980,retmode='text',rettype='medline') records = Medline.parse(pubmed_rec) for rec in records: print(rec) The output is: {'PMID': '19053980', 'OWN': 'NLM', 'STAT': 'MEDLINE', 'DCOM': '20090706', 'LR': '20091015', 'IS': '1365-2036 (Electronic) 0269-2813

extract xml from xml embebed in html

杀马特。学长 韩版系。学妹 提交于 2020-01-04 04:42:06
问题 im trying to get the xml presented here http://www.ncbi.nlm.nih.gov/sra/ERX086768?report=FullXml but its a bit tricky cause they dont give any suport for it. The purpose is to get the xml to php in order to go trought the xml. can someone give a hint? 回答1: It's not really true that XML presented via HTML therein wouldn't be XML as well. What you're looking for is something called textContent in DOMDocument. That will give you only the text from that HMTL. Like it is displayed "as text" in the

How do I get gene features in FASTA nucleotide format from NCBI using Perl?

与世无争的帅哥 提交于 2019-12-13 13:11:34
问题 I am able to download a FASTA file manually that looks like: >lcl|CR543861.1_gene_1... ATGCTTTGGACA... >lcl|CR543861.1_gene_2... GTGCGACTAAAA... by clicking "Send to" and selecting "Gene Features", FASTA Nucleotide is the only option (which is fine because that's all I want) on this page. With a script like this: #!/usr/bin/env perl use strict; use warnings; use Bio::DB::EUtilities; my $factory = Bio::DB::EUtilities->new(-eutil => 'efetch', -db => 'nucleotide', -id => 'CR543861', -rettype =>

Paste some elements of mixed vector

走远了吗. 提交于 2019-12-11 04:38:13
问题 I have a vector with terms that may be followed by zero or more qualifiers starting with "/". The first element should always be a term. mesh <- c("Animals", "/physiology" , "/metabolism*", "Insects", "Arabidopsis", "/immunology" ) I'd like to join the qualifier with the last term and get a new vector Animals/physiology Animals/metabolism* Insects Arabidopsis/immunology 回答1: Make a group identifier by grepl ing for values not starting with a / , split on this group identifier, then paste0 :

PHP Simplexml_Load_File fails

做~自己de王妃 提交于 2019-12-08 07:20:59
问题 I have successfully been able to get a pubmed results page in xml format and write the contents to a local file "Publications.xml". The problem is when I use simplexml_load_file("Publications.xml"), it fails. Not able to figure out why. <?php $feed = 'http://www.ncbi.nlm.nih.gov/pubmed?term=carl&sort=pubdate&report=xml'; $local = 'Publications.xml'; $curtime = time(); $filemodtime; if( (!file_exists($local)) || (time() - filemtime($local)) > 86400 ) { $contents = file_get_contents($feed); $fp

urllib2.HTTPError Python

坚强是说给别人听的谎言 提交于 2019-12-08 05:30:05
问题 I have a file with GI numbers and would like to get FASTA sequences from ncbi. from Bio import Entrez import time Entrez.email ="eigtw59tyjrt403@gmail.com" f = open("C:\\bioinformatics\\gilist.txt") for line in iter(f): handle = Entrez.efetch(db="nucleotide", id=line, retmode="xml") records = Entrez.read(handle) print ">GI "+line.rstrip()+" "+records[0]["GBSeq_primary-accession"]+" "+records[0]["GBSeq_definition"]+"\n"+records[0]["GBSeq_sequence"] time.sleep(1) # to make sure not many

The new RefSeq release from NCBI is compatible with Bio.Entrez.Parser?

随声附和 提交于 2019-12-08 03:16:41
问题 I'm new with python and especially with Biopython. I'm trying to take some information from an XML file with Entrez.efetch and then read it. Last week this script worked well: handle = Entrez.efetch(db="Protein", id="YP_008872780.1", retmode="xml") records = Entrez.read(handle) But now I'm getting an Error: > Bio.Entrez.Parser.ValidationError: Failed to find tag 'GBSeq_xrefs' in the DTD. To skip all tags that are not represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse with

PHP Simplexml_Load_File fails

*爱你&永不变心* 提交于 2019-12-06 16:44:27
I have successfully been able to get a pubmed results page in xml format and write the contents to a local file "Publications.xml". The problem is when I use simplexml_load_file("Publications.xml"), it fails. Not able to figure out why. <?php $feed = 'http://www.ncbi.nlm.nih.gov/pubmed?term=carl&sort=pubdate&report=xml'; $local = 'Publications.xml'; $curtime = time(); $filemodtime; if( (!file_exists($local)) || (time() - filemtime($local)) > 86400 ) { $contents = file_get_contents($feed); $fp = fopen($local,"w"); fwrite($fp, $contents); fclose($fp); } $xml = simplexml_load_file($local) or (

The new RefSeq release from NCBI is compatible with Bio.Entrez.Parser?

…衆ロ難τιáo~ 提交于 2019-12-06 14:28:37
I'm new with python and especially with Biopython. I'm trying to take some information from an XML file with Entrez.efetch and then read it. Last week this script worked well: handle = Entrez.efetch(db="Protein", id="YP_008872780.1", retmode="xml") records = Entrez.read(handle) But now I'm getting an Error: > Bio.Entrez.Parser.ValidationError: Failed to find tag 'GBSeq_xrefs' in the DTD. To skip all tags that are not represented in the DTD, please call Bio.Entrez.read or Bio.Entrez.parse with validate=False. So I run this: records = Entrez.read(handle, validate=False) But I'm still getting an

How can I get taxonomic rank names from taxid?

痞子三分冷 提交于 2019-12-06 12:08:01
问题 This question is related to: How to get taxonomic specific ids for kingdom, phylum, class, order, family, genus and species from taxid? The solution given there works but I would like to have the names for each taxonomic ids for defined ranks. I have found this on ete3 which can do the job: names = ncbi.get_taxid_translator(lineage) print [names[taxid] for taxid in lineage] but not being python programmer, I am failing to incorporate this into the code given in the link above. Here is what I