Find the similarity metric between two strings

前端 未结 11 2007
长情又很酷
长情又很酷 2020-11-22 13:24

How do I get the probability of a string being similar to another string in Python?

I want to get a decimal value like 0.9 (meaning 90%) etc. Preferably with standar

11条回答
  •  猫巷女王i
    2020-11-22 14:23

    The builtin SequenceMatcher is very slow on large input, here's how it can be done with diff-match-patch:

    from diff_match_patch import diff_match_patch
    
    def compute_similarity_and_diff(text1, text2):
        dmp = diff_match_patch()
        dmp.Diff_Timeout = 0.0
        diff = dmp.diff_main(text1, text2, False)
    
        # similarity
        common_text = sum([len(txt) for op, txt in diff if op == 0])
        text_length = max(len(text1), len(text2))
        sim = common_text / text_length
    
        return sim, diff
    

提交回复
热议问题