Fuzzy Group By, Grouping Similar Words

后端 未结 5 750
耶瑟儿~
耶瑟儿~ 2020-12-10 07:44

this question is asked here before

What is a good strategy to group similar words?

but no clear answer is given on how to \"group\" items. The solution based

5条回答
  •  攒了一身酷
    2020-12-10 07:55

    You have to decide in closed matches words, which words you want to use. May be get the first element from the list which get_close_matches is returning, or just use random function on that list and get one element from closed matches.

    There must be some sort of rule, for it..

    In [19]: import difflib
    
    In [20]: a = ['ape', 'appel', 'apple', 'peach', 'puppy']
    
    In [21]: a = ['appel', 'apple', 'peach', 'puppy']
    
    In [22]: b = difflib.get_close_matches('ape',a)
    
    In [23]: b
    Out[23]: ['apple', 'appel']
    
    In [24]: import random
    
    In [25]: c = random.choice(b)
    
    In [26]: c
    Out[26]: 'apple'
    
    In [27]: 
    

    Now remove c from the initial list, thats it... For c++, you can use Levenshtein_distance

提交回复
热议问题