k-means clustering on term-term co-ocurrence matrix

可紊 提交于 2019-12-12 06:31:57

问题


I derive a term-term co-occurrence matrix, K from a Document-Term Matrix in R. I am interested in carrying out a K-means clustering analysis on the keyword-by-keyword matrix, K. The dimension of K is 8962 terms x 8962 terms.

I pass K to the kmeans function as follows:

for(i in 1:25){
    #Run kmeans for each level of i, allowing up to 100 iterations for convergence
    kmeans<- kmeans(x=K, centers=i, iter.max=100)

    #Combine cluster number and cost together, write to df
    cost_df<- rbind(cost_df, cbind(i, kmeans$tot.withinss))

 }

My original Document-Term matrix which was 590 documents x 8962 terms and running the above code on the DTM does not give me the hanging issue. However, I do encounter hanging with the keyword-by-keyword matrix due to its size. Any suggestions as to how to overcome this would be helpful.


回答1:


k-means requires coordinates. Because it needs to be able to compute means (that is why it's called k-means).

You have a sort of similarity matrix there. Choose other clustering algorithms instead.




回答2:


Your matrices are large but VERY sparse. Try using a sparse matrix.



来源:https://stackoverflow.com/questions/36989005/k-means-clustering-on-term-term-co-ocurrence-matrix

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!