What is a good strategy to group similar words?

前端 未结 5 1750
孤城傲影
孤城傲影 2020-12-29 11:35

Say I have a list of movie names with misspellings and small variations like this -

 \"Pirates of the Caribbean: The Curse of the Black Pearl\"
 \"Pirates o         


        
5条回答
  •  半阙折子戏
    2020-12-29 11:48

    Have a look at "fuzzy matching". Some great tools in the thread below that calculates similarities between strings.

    I'm especially fond of the difflib module

    >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy'])
    ['apple', 'ape']
    >>> import keyword
    >>> get_close_matches('wheel', keyword.kwlist)
    ['while']
    >>> get_close_matches('apple', keyword.kwlist)
    []
    >>> get_close_matches('accept', keyword.kwlist)
    ['except']
    

    https://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy-string-comparison

提交回复
热议问题