I am using the seaborn clustermap
to create clusters and visually it works great (this example produces very similar results).
However I am having troub
You probably want a new column in your dataframe with the cluster membership. I've managed to do this from assembled snippets of code stolen from all over the web:
import seaborn
import scipy
g = seaborn.clustermap(df,method='average')
den = scipy.cluster.hierarchy.dendrogram(g.dendrogram_col.linkage,
labels = df.index,
color_threshold=0.60)
from collections import defaultdict
def get_cluster_classes(den, label='ivl'):
cluster_idxs = defaultdict(list)
for c, pi in zip(den['color_list'], den['icoord']):
for leg in pi[1:3]:
i = (leg - 5.0) / 10.0
if abs(i - int(i)) < 1e-5:
cluster_idxs[c].append(int(i))
cluster_classes = {}
for c, l in cluster_idxs.items():
i_l = [den[label][i] for i in l]
cluster_classes[c] = i_l
return cluster_classes
clusters = get_cluster_classes(den)
cluster = []
for i in df.index:
included=False
for j in clusters.keys():
if i in clusters[j]:
cluster.append(j)
included=True
if not included:
cluster.append(None)
df["cluster"] = cluster
So this gives you a column with 'g' or 'r' for the green- or red-labeled clusters. I determine my color_threshold by plotting the dendrogram, and eyeballing the y-axis values.