levenshtein distance with items in list in python

99封情书 提交于 2019-12-08 11:42:58

问题


I have two list, below, and i want to compare if words that are similar levenshtein distance of less than 2. I have a function to find the levenshtein distance, however as parameters it needs the two words. I can find which words are not in the other list, but it is not helping. And I can go index by index but as in the case below when i get to index 7 (but and except) everything is thrown off because infidelity will be index 9 and 8 and wcop88 is 9 and 10 hence those won't be compare. Is there some way to say if part of infidelity is in some word in the other list then check those two, note this won't always work because say if infidelity and infedellty there is only the in and ty that can match and many words could possibly match that

[u'rt', u'cuaimatizada', u's', u'cuaimaqueserespeta', u'forgives', u'any', u'mistake', u'but', u'the', u'infidelity', u'wocp88']
[u'rt', u'cuiamatizada', u's', u'cuimaqueserespeta', u'forgive', u'any', u'mistake', u'except', u'infedelity', u'wcop88']

Edit: So my goal is to be able to feed my levenshtein function the two words the need to be check. In this case the following pairs:

u'cuaimatizada      u'cuiamatizada

u'cuaimaqueserespeta u'cuimaqueserespeta

u'forgives   u'forgive

u'infedelity  u'infidelity

u'wocp88 u'wcop88

I do not know which words before hand.


回答1:


I think this is what you want ... but it compares all words... not just matching indexes

 wordpairs = [(w1,w2) for w1 in list1 for w2 in list2 if levenstein(w1,w2) < 2]

>>> matches = [(w1,w2) for w1 in l12 for w2 in l22 if levenshtein(w1,w2) < 2]

[(u'rt', u'rt'), (u's', u's'), (u'cuaimaqueserespeta', u'cuimaqueserespeta'), (u'forgives', u'forgive'), (u'any', u'any'), (u'mistake', u'mistake'), (u'infidelity',u'infedelity')]


来源:https://stackoverflow.com/questions/11437121/levenshtein-distance-with-items-in-list-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!