How does clustering (especially String clustering) work?

前端 未结 3 1208
轻奢々
轻奢々 2020-12-07 17:56

I heard about clustering to group similar data. I want to know how it works in the specific case for String.

I have a table with more than different 100,000 words. <

3条回答
  •  难免孤独
    2020-12-07 18:29

    You can use an algorithm like the Levenshtein distance for the distance calculation and k-means for clustering.

    the Levenshtein distance is a string metric for measuring the amount of difference between two sequences

    Do some testing and find a similarity threshold per word that will decide your groups.

提交回复
热议问题