What are some algorithms for comparing how similar two strings are?

后端 未结 5 1471
故里飘歌
故里飘歌 2020-11-30 17:45

I need to compare strings to decide whether they represent the same thing. This relates to case titles entered by humans where abbreviations and other small details may di

5条回答
  •  时光说笑
    2020-11-30 18:41

    Another algorithm that you can consider is the Simon White Similarity:

    def get_bigrams(string):
        """
        Take a string and return a list of bigrams.
        """
        if string is None:
            return ""
    
        s = string.lower()
        return [s[i : i + 2] for i in list(range(len(s) - 1))]
    
    def simon_similarity(str1, str2):
        """
        Perform bigram comparison between two strings
        and return a percentage match in decimal form.
        """
        pairs1 = get_bigrams(str1)
        pairs2 = get_bigrams(str2)
        union = len(pairs1) + len(pairs2)
    
        if union == 0 or union is None:
            return 0
    
        hit_count = 0
        for x in pairs1:
            for y in pairs2:
                if x == y:
                    hit_count += 1
                    break
        return (2.0 * hit_count) / union
    

提交回复
热议问题