retrieve leave colors from scipy dendrogram

南楼画角 提交于 2021-02-08 05:17:35

问题


I can not get the color leaves from the scipy dendrogram dictionary. As stated in the documentation and in this github issue, the color_list key in the dendrogram dictionary refers to the links, not the leaves. It would be nice to have another key referring to the leaves, sometimes you need this for coloring other types of graphics, such as this scatter plot in the example below.

import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, dendrogram

# DATA EXAMPLE
x = np.array([[ 5, 3],
              [10,15],
              [15,12],
              [24,10],
              [30,30],
              [85,70],
              [71,80]])

# DENDROGRAM
plt.figure()
plt.subplot(121)
z = linkage(x, 'single')
d = dendrogram(z)

# COLORED PLOT
# This is what I would like to achieve. Colors are assigned manually by looking
# at the dendrogram, because I failed to get it from d['color_list'] (it refers 
# to links, not observations)
plt.subplot(122)
points = d['leaves']
colors = ['r','r','g','g','g','g','g']
for point, color in zip(points, colors):
    plt.plot(x[point, 0], x[point, 1], 'o', color=color)

Manual color assignment seems easy in this example, but I'm dealing with huge datasets, so until we get this new feature in the dictionary (color leaves), I'm trying to infer it somehow with the current information contained in the dictionary but I'm out of ideas so far. Can anyone help me?

Thanks.


回答1:


The following approach seems to work. The dictionary returned by the dendogram contains 'color_list' with the colors of the linkages. And 'icoord' and 'dcoord' with the x, resp. y, plot coordinates of these linkages. These x-positions are 5, 15, 25, ... when the linkage starts at a point. So, testing these x-positions can bring us back from the linkage to the corresponding point. And allows to assign the color of the linkage to the point.

import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import linkage, dendrogram

# DATA EXAMPLE
x = np.random.uniform(0, 10, (20, 2))

# DENDROGRAM
plt.figure()
plt.subplot(121)
z = linkage(x, 'single')
d = dendrogram(z)
plt.yticks([])

# COLORED PLOT
plt.subplot(122)
points = d['leaves']
colors = ['none'] * len(points)
for xs, c in zip(d['icoord'], d['color_list']):
    for xi in xs:
        if xi % 10 == 5:
            colors[(int(xi)-5) // 10] = c
for point, color in zip(points, colors):
    plt.plot(x[point, 0], x[point, 1], 'o', color=color)
    plt.text(x[point, 0], x[point, 1], f' {point}')
plt.show()

PS: This post about matching points with their clusters might also be relevant.



来源:https://stackoverflow.com/questions/61959602/retrieve-leave-colors-from-scipy-dendrogram

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!