Converting ndarray generated by hcluster into a Newick string for use with ete2 package

大憨熊 提交于 2019-12-03 10:09:06

问题


I have a list of vectors created by running:

import hcluster
import numpy as np
from ete2 import Tree

vecs = [np.array(i) for i in document_list] 

where document_list is a collection of web documents I am analysing. I then perform hierarchical clustering:

Z = hcluster.linkage(vecs, metric='cosine') 

This generates an ndarray such as:

[[ 12.          19.           0.           1.        ]
[ 15.          21.           0.           3.        ]
[ 18.          22.           0.           4.        ]
[  3.          16.           0.           7.        ]
[  8.          23.           0.           6.        ]
[  5.          27.           0.           6.        ]
[  1.          28.           0.           7.        ]
[  0.          21.           0.           2.        ]
[  5.          29.           0.18350472   2.        ]
[  2.          10.           0.18350472   3.        ]
[ 47.          30.           0.29289577   9.        ]
[ 13.          28.           0.29289577  13.        ]
[ 73.          32.           0.29289577  18.        ]
[ 26.          12.           0.42264521   5.        ]
[  5.          33.           0.42264521  12.        ]
[ 14.          35.           0.42264521  12.        ]
[ 19.          35.           0.42264521  18.        ]
[  4.          20.           0.31174826   3.        ]
[ 34.          21.           0.5         19.        ]
[ 38.          29.           0.31174826  21.        ]]

Is it possible to convert this ndarray into a newick string that can be passed to the ete2 Tree() constructor so that I can draw and manipulate a newick tree using the tools provided by ete2?

Does it even make sense to try and do this and if not is there another way that I can generate a tree/dendrogram using the same data and ete2 (I realise that there are other packages that can draw dendrograms such as dendropy and hcluster itself but would prefer to use ete2 all the same)?

Thanks!


回答1:


I use the following approach for pretty much the same thing:

from hcluster import linkage, to_tree
from ete2 import Tree

#hcluster part
Y = dist_matrix(items, dist_fn)
Z = linkage(Y, "single")
T = to_tree(Z)

#ete2 section
root = Tree()
root.dist = 0
root.name = "root"
item2node = {T: root}

to_visit = [T]
while to_visit:
    node = to_visit.pop()
    cl_dist = node.dist /2.0
    for ch_node in [node.left, node.right]:
        if ch_node:
            ch = Tree()
            ch.dist = cl_dist
            ch.name = str(ch_node.id)
            item2node[node].add_child(ch)
            item2node[ch_node] = ch
            to_visit.append(ch_node)

# This is your ETE tree structure
tree = root



回答2:


Update:

from hcluster import linkage, to_tree
from ete2 import Tree

#hcluster part
Y = dist_matrix(items, dist_fn)
Z = linkage(Y, "single")

R,T       = to_tree( mat, rd=True )
#print "ROOT", R, "TREE", T
root      = Tree()
root.dist = 0
root.name = 'root'
item2node = {R.get_id(): root}
to_visit  = T

while to_visit:
    node = to_visit.pop()
    #print "NODE", node
    cl_dist = node.dist / 2.0

    for ch_node in [node.get_left(), node.get_right()]:
        if ch_node:
            ch_node_id         = ch_node.get_id()
            ch_node_name       = str(ch_node_id)
            ch                 = Tree()
            ch.dist            = cl_dist
            ch.name            = ch_node_name

            if nodeNames:
                if ch_node_id < len(nodeNames):
                    ch.name    = nodeNames[ ch_node_id ]

            item2node[ch_node_id] = ch
            item2node[ch_node_id].add_child(ch)
            to_visit.append(ch_node)


来源:https://stackoverflow.com/questions/9364609/converting-ndarray-generated-by-hcluster-into-a-newick-string-for-use-with-ete2

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!