How can I get taxonomic rank names from taxid?

喜夏-厌秋 提交于 2019-12-04 15:20:58

Let's leave your taxids as they are.

taxids = [1204725, 2162,  1300163, 420247]

Then call get_desired_ranks for each individual taxid.

for taxid in taxids:
    ranks = get_desired_ranks(taxid, desired_ranks)

Now call ncbi.get_taxid_translator for each key (rank) in ranks and print the output:

for taxid in taxids:
    print(ncbi.get_taxid_translator([taxid]))
    ranks = get_desired_ranks(taxid, desired_ranks)
    for key, rank in ranks.items():
        if rank != '<not present>':
            print(ncbi.get_taxid_translator([rank]))

Output

{1204725: 'Methanobacterium formicicum DSM 3637'}
{183925: 'Methanobacteria'}
{2159: 'Methanobacteriaceae'}
{2160: 'Methanobacterium'}
{28890: 'Euryarchaeota'}
{2162: 'Methanobacterium formicicum'}
{2158: 'Methanobacteriales'}
{2162: 'Methanobacterium formicicum'}
[...]      
{420247: 'Methanobrevibacter smithii ATCC 35061'}
{183925: 'Methanobacteria'}
{2159: 'Methanobacteriaceae'}
{2172: 'Methanobrevibacter'}
{28890: 'Euryarchaeota'}
{2173: 'Methanobrevibacter smithii'}
{2158: 'Methanobacteriales'}

Complete code with improved output

import csv
from ete3 import NCBITaxa

ncbi = NCBITaxa()

def get_desired_ranks(taxid, desired_ranks):
    lineage = ncbi.get_lineage(taxid)   
    names = ncbi.get_taxid_translator(lineage)
    lineage2ranks = ncbi.get_rank(names)
    ranks2lineage = dict((rank,taxid) for (taxid, rank) in lineage2ranks.items())
    return{'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}

if __name__ == '__main__':
    taxids = [1204725, 2162,  1300163, 420247]
    desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
    for taxid in taxids:
        print(list(ncbi.get_taxid_translator([taxid]).values())[0])
        ranks = get_desired_ranks(taxid, desired_ranks)
        for key, rank in ranks.items():
            if rank != '<not present>':
                print(key + ': ' + list(ncbi.get_taxid_translator([rank]).values())[0])
        print('=' * 60)

If you want to have a tab-separated output you can concatenate the strings with \t or just add all results to a list and join with \t.

In the snippet below, the results are stored in a list called results which contains another list which stores your fields (original ID, kingdom, etc.). In each loop the results are added to the last entry (results[-1]).

if __name__ == '__main__':
    taxids = [1204725, 2162,  1300163, 420247]
    desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
    results = list()
    for taxid in taxids:
        results.append(list())
        results[-1].append(str(taxid))
        ranks = get_desired_ranks(taxid, desired_ranks)
        for key, rank in ranks.items():
            if rank != '<not present>':
                results[-1].append(list(ncbi.get_taxid_translator([rank]).values())[0])
            else:
                results[-1].append(rank)

    #generate the header
    header = ['Original_query_taxid']
    header.extend(desired_ranks)
    print('\t'.join(header))

    #print the results
    for result in results:
        print('\t'.join(result))

Output

Original_query_taxid    kingdom phylum  class   order   family  genus   species
1204725 Methanobacterium formicicum     Methanobacteriaceae     Euryarchaeota
Methanobacteria Methanobacteriales      Methanobacterium        <not present>
2162    Methanobacterium formicicum     Methanobacteriaceae     Euryarchaeota
Methanobacteria Methanobacteriales      Methanobacterium        <not present>
1300163 Methanobacterium formicicum     Methanobacteriaceae     Euryarchaeota
Methanobacteria Methanobacteriales      Methanobacterium        <not present>
420247  Methanobrevibacter smithii      Methanobacteriaceae     Euryarchaeota
Methanobacteria Methanobacteriales      Methanobrevibacter      <not present>

I do not have enough reputation to comment the Maximilian Peters answer. I tried his code and it worked, but the information was not displayed in the order of desired_ranks = ['kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species'].

To get the information in the right column, with superkingdom information, use:

import csv
from ete3 import NCBITaxa

ncbi = NCBITaxa()

def get_desired_ranks(taxid, desired_ranks):
    lineage = ncbi.get_lineage(taxid)   
    names = ncbi.get_taxid_translator(lineage)
    lineage2ranks = ncbi.get_rank(names)
    ranks2lineage = dict((rank,taxid) for (taxid, rank) in lineage2ranks.items())
    return{'{}_id'.format(rank): ranks2lineage.get(rank, '<not present>') for rank in desired_ranks}

if __name__ == '__main__':
    taxids = [1204725, 2162,  1300163, 420247]
    desired_ranks = ['superkingdom', 'kingdom', 'phylum', 'class', 'order', 'family', 'genus', 'species']
    results = list()
    for taxid in taxids:
        results.append(list())
        results[-1].append(str(taxid))
        ranks = get_desired_ranks(taxid, desired_ranks)
        for key, rank in ranks.items():
            if rank != '<not present>':
                results[-1].append(list(ncbi.get_taxid_translator([rank]).values())[0])
            else:
                results[-1].append(rank)

    #generate the header
    header = ['Original_query_taxid']
    header.extend(desired_ranks)
    print('\t'.join(header))

    #print the results
    for result in results:
        print('\t'.join([result[i] for i in [0, 2, 5, 7, 4, 6, 3, 8, 1]]))

Output:

Original_query_taxid    superkingdom    kingdom phylum  class   order   family  genus   species
1204725 Archaea <not present>   Euryarchaeota   Methanobacteria Methanobacteriales  Methanobacteriaceae Methanobacterium    Methanobacterium formicicum
2162    Archaea <not present>   Euryarchaeota   Methanobacteria Methanobacteriales  Methanobacteriaceae Methanobacterium    Methanobacterium formicicum
1300163 Archaea <not present>   Euryarchaeota   Methanobacteria Methanobacteriales  Methanobacteriaceae Methanobacterium    Methanobacterium formicicum
420247  Archaea <not present>   Euryarchaeota   Methanobacteria Methanobacteriales  Methanobacteriaceae Methanobrevibacter  Methanobrevibacter smithii

This output matches the NCBI information, e.g. https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=1204725.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!