Fuzzy String Comparison

后端 未结 4 1041
情歌与酒
情歌与酒 2020-11-29 17:01

What I am striving to complete is a program which reads in a file and will compare each sentence according to the original sentence. The sentence which is a perfect match to

4条回答
  •  天命终不由人
    2020-11-29 17:08

    fuzzyset is much faster than fuzzywuzzy (difflib) for both indexing and searching.

    from fuzzyset import FuzzySet
    corpus = """It was a murky and stormy night. I was all alone sitting on a crimson chair. I was not completely alone as I had three felines
        It was a murky and tempestuous night. I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines
        I was all alone sitting on a crimson cathedra. I was not completely alone as I had three felines. It was a murky and tempestuous night.
        It was a dark and stormy night. I was not alone. I was not sitting on a red chair. I had three cats."""
    corpus = [line.lstrip() for line in corpus.split("\n")]
    fs = FuzzySet(corpus)
    query = "It was a dark and stormy night. I was all alone sitting on a red chair. I was not completely alone as I had three cats."
    fs.get(query)
    # [(0.873015873015873, 'It was a murky and stormy night. I was all alone sitting on a crimson chair. I was not completely alone as I had three felines')]
    

    Warning: Be careful not to mix unicode and bytes in your fuzzyset.

提交回复
热议问题