I'm trying to expand my color_palette
in either matplotlib
or seaborn
for use in scipy
's dendrogram so it colors each cluster differently.
Currently, the color_palette
only has a few colors so multiple clusters are getting mapped to the same color. I know there's like 16 million RGB
colors, so...
How can I use more colors from that huge palette in this type of figure?
#!/usr/bin/python
from __future__ import print_function
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import colorsys
from scipy.cluster.hierarchy import dendrogram,linkage,fcluster
from scipy.spatial import distance
np.random.seed(0) #43984
#Dims
n,m = 10,1000
#DataFrame: rows = Samples, cols = Attributes
attributes = ["a" + str(j) for j in range(m)]
DF_data = pd.DataFrame(np.random.randn(n, m),#
columns = attributes)
A_dist = distance.cdist(DF_data.as_matrix().T, DF_data.as_matrix().T)
DF_dist = pd.DataFrame(A_dist, index = attributes, columns = attributes)
#Linkage Matrix
Z = linkage(squareform(DF_dist.as_matrix()),method="average") #metric="euclidead" necessary since the input is a dissimilarity measure?
#Create dendrogram
D_dendro = dendrogram(
Z,
labels=DF_dist.index,
no_plot=True,
color_threshold=3.5,
count_sort = "ascending",
#link_color_func=lambda k: colors[k]
)
#Display dendrogram
def plotTree(D_dendro):
fig,ax = plt.subplots(figsize=(25, 10))
icoord = np.array( D_dendro['icoord'] )
dcoord = np.array( D_dendro['dcoord'] )
color_list = np.array( D_dendro['color_list'] )
x_min, x_max = icoord.min(), icoord.max()
y_min, y_max = dcoord.min(), dcoord.max()
for xs, ys, color in zip(icoord, dcoord, color_list):
plt.plot(xs, ys, color)
plt.xlim( x_min-10, x_max + 0.1*abs(x_max) )
plt.ylim( y_min, y_max + 0.1*abs(y_max) )
plt.title("Dendrogram", fontsize=30)
plt.xlabel("Clusters", fontsize=25)
plt.ylabel("Distance", fontsize=25)
plt.yticks(fontsize = 20)
plt.show()
return(fig,ax)
fig,ax = plotTree(D_dendro) #wrapper I made
#Dims
print(
len(set(D_dendro["color_list"])), "^ # of colors from dendrogram",
len(D_dendro["ivl"]), "^ # of labels",sep="\n")
# 7
# ^ # of colors from dendrogram
# 1000
# ^ # of labels
Most matplotlib colormaps will give you a value given a value between 0 and 1. For example,
import matplotlib.pyplot as plt
import numpy as np
print [plt.cm.Greens(i) for i in np.linspace(0, 1, 5)]
will print
[(0.9686274528503418, 0.98823529481887817, 0.96078431606292725, 1.0),
(0.77922338878407194, 0.91323337695177864, 0.75180316742728737, 1.0),
(0.45176470875740049, 0.76708959481295413, 0.46120723030146432, 1.0),
(0.13402538141783546, 0.54232989970375511, 0.26828144368003398, 1.0),
(0.0, 0.26666668057441711, 0.10588235408067703, 1.0)]
So you no longer need to be restricted to values provided to you. Just choose a colormap, and get a color from that colormap depending upon some fraction. For example, in your code, you could consider,
for xs, ys in zip(icoord, dcoord):
color = plt.cm.Spectral( ys/6.0 )
plt.plot(xs, ys, color)
or something to that effect. I am unsure how exactly you want to display your colors, but I am sure you can modify your code very easily for achieving any color combinations you want ...
Another thing you can try is
N = D_dendro["color_list"]
colorList = [ plt.cm.Spectral( float(i)/(N-1) ) for i in range(N)]
and pass on that colorList
.
Play around a bit ...
来源:https://stackoverflow.com/questions/36538090/bigger-color-palette-in-matplotlib-for-scipys-dendrogram-python