Fuzzy String Comparison

后端未结

关注

 4  1041

情歌与酒 2020-11-29 17:01

What I am striving to complete is a program which reads in a file and will compare each sentence according to the original sentence. The sentence which is a perfect match to

4条回答

天命终不由人 (楼主)

2020-11-29 17:08

fuzzyset is much faster than fuzzywuzzy (difflib) for both indexing and searching.

from fuzzyset import FuzzySet
corpus = """It was a murky and stormy night. I was all alone sitting on a crimson chair. I was not completely alone as I had three felines
    It was a murky and tempestuous night. I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines
    I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines. It was a murky and tempestuous night.
    It was a dark and stormy night. I was not alone. I was not sitting on a red chair. I had three cats."""
corpus = [line.lstrip() for line in corpus.split("\n")]
fs = FuzzySet(corpus)
query = "It was a dark and stormy night. I was all alone sitting on a red chair. I was not completely alone as I had three cats."
fs.get(query)
# [(0.873015873015873, 'It was a murky and stormy night. I was all alone sitting on a crimson chair. I was not completely alone as I had three felines')]

Warning: Be careful not to mix unicode and bytes in your fuzzyset.

0 讨论(0)

查看其它4个回答