is there a way to preserve the clustering in a heatmap but reduce the number of observations?

十年热恋 提交于 2019-12-07 03:09:29

I would like to answer my own question and want feedback. I used the kmeans_k=30 in the pheatmap and obtained 29 clusters that are still able to preserve my clustering of the 90 observations that I made previously. From there I obtained the genes in their respective clusters. I selected the top 5 clusters from that heatmap on either side of the observations that can still produce my required heatmap since they are the ones having high SD. Since all through my pheatmap I have scale="row" and kept both row dendrogram and col dendrogram on, I did not want to change them even now. So when I now plot this 31 genes(observations) in fact they improve my row clustering even more and totally partitions them in 2 groups in a more cleaner way as I wanted. Codes for kemans and new heatmap

with kmeans 30

obj<-pheatmap(df.90,scale="row",clustering_distance_cols = "correlation",show_rownames= T,show_colnames=T,color=col,annotation=batch.annotation,cluster_col=T,fontsize_row = 6,fontsize_col = 7,clustering_method = "ward.D2",border_color = NA,cellwidth = NA,cellheight = NA,kmeans_k = 30)

retrieve the clusters and extract the observations/genes

obj$kmeans$cluster

obtaining the top clusters and plot them with the heatmap

pheatmap(mydata[rownames(df.31),],scale="row",clustering_distance_cols = "correlation",show_rownames= T,show_colnames=T,color=col,annotation=batch.annotation,cluster_col=T,fontsize_row = 8,fontsize_col = 8,clustering_method = "ward.D2",border_color = NA,)

What you guys think of this approach? It is not like the one I intended but it is also not wrong I think. I would like to have feedback if someone can give a better method or approach or if they think it is also not correct. Thanks

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!