Find the similarity metric between two strings

前端 未结 11 1927
长情又很酷
长情又很酷 2020-11-22 13:24

How do I get the probability of a string being similar to another string in Python?

I want to get a decimal value like 0.9 (meaning 90%) etc. Preferably with standar

11条回答
  •  悲&欢浪女
    2020-11-22 14:27

    There are many metrics to define similarity and distance between strings as mentioned above. I will give my 5 cents by showing an example of Jaccard similarity with Q-Grams and an example with edit distance.

    The libraries

    from nltk.metrics.distance import jaccard_distance
    from nltk.util import ngrams
    from nltk.metrics.distance  import edit_distance
    

    Jaccard Similarity

    1-jaccard_distance(set(ngrams('Apple', 2)), set(ngrams('Appel', 2)))
    

    and we get:

    0.33333333333333337
    

    And for the Apple and Mango

    1-jaccard_distance(set(ngrams('Apple', 2)), set(ngrams('Mango', 2)))
    

    and we get:

    0.0
    

    Edit Distance

    edit_distance('Apple', 'Appel')
    

    and we get:

    2
    

    And finally,

    edit_distance('Apple', 'Mango')
    

    and we get:

    5
    

    Cosine Similarity on Q-Grams (q=2)

    Another solution is to work with the textdistance library. I will provide an example of Cosine Similarity

    import textdistance
    1-textdistance.Cosine(qval=2).distance('Apple', 'Appel')
    

    and we get:

    0.5
    

提交回复
热议问题