发表新帖

发表新帖

Find the similarity metric between two strings

前端未结

关注

 11  1927

长情又很酷 2020-11-22 13:24

How do I get the probability of a string being similar to another string in Python?

I want to get a decimal value like 0.9 (meaning 90%) etc. Preferably with standar

11条回答

悲&欢浪女 (楼主)

2020-11-22 14:27
There are many metrics to define similarity and distance between strings as mentioned above. I will give my 5 cents by showing an example of Jaccard similarity with Q-Grams and an example with edit distance.

The libraries
```
from nltk.metrics.distance import jaccard_distance
from nltk.util import ngrams
from nltk.metrics.distance  import edit_distance
```
Jaccard Similarity
```
1-jaccard_distance(set(ngrams('Apple', 2)), set(ngrams('Appel', 2)))
```
and we get:
```
0.33333333333333337
```
And for the Apple and Mango
```
1-jaccard_distance(set(ngrams('Apple', 2)), set(ngrams('Mango', 2)))
```
and we get:
```
0.0
```
Edit Distance
```
edit_distance('Apple', 'Appel')
```
and we get:
```
2
```
And finally,
```
edit_distance('Apple', 'Mango')
```
and we get:
```
5
```
Cosine Similarity on Q-Grams (q=2)

Another solution is to work with the textdistance library. I will provide an example of Cosine Similarity
```
import textdistance
1-textdistance.Cosine(qval=2).distance('Apple', 'Appel')
```
and we get:
```
0.5
```
0 讨论(0)

查看其它11个回答
发布评论:

提交评论
- 加载中...

热议问题