String clustering in Python

别等时光非礼了梦想. 提交于 2019-12-24 01:54:54

问题


I have a list of strings and I want to classify it by using clustering in Python.

list = ['String1', 'String2', 'String3',...]

I want to use Levenshtein distance, so I used jellyfish library. Given two strings, I know that their distance can be found this way:

jellyfish.levenshtein_distance('string1', 'string2')

My problem is that I don't know how to use scipy.cluster.hierarchy to get a list in Python of each cluster. I have also tried using linkage function:

linkage(y[, method, metric])

But I am not able to get the final list with clusters.

Any help?


回答1:


After using linkage for implementing hierarchical clustering on the distance you have, you should use cluster.hierarchy.cut_tree to cut the tree. If you want two clusters:

cluster.hierarchy.cut_tree(linkage_output,2).ravel() #.ravel makes it 1D array.


来源:https://stackoverflow.com/questions/36949795/string-clustering-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!