I\'m trying to find some sort of a good, fuzzy string matching algorithm. Direct matching doesn\'t work for me — this isn\'t too good because unless my strings are a 100% si
You can try FuzzySearchEngine from https://github.com/frazenshtein/fastcd/blob/master/search.py.
This fuzzy search supports only search for words and has a fixed admissible error for the word (only one substitution or transposition of two adjacent characters).
However, for example you can try something like:
import search
string = "Chapter I. The quick brown fox jumped over the lazy dog."
substr = "the qiuck broqn fox."
def fuzzy_search_for_sentences(substr, string):
start = None
pos = 0
for word in substr.split(" "):
if not word:
continue
match = search.FuzzySearchEngine(word).search(string, pos=pos)
if not match:
return None
if start is None:
start = match.start()
pos = match.end()
return start
print(fuzzy_search_for_sentences(substr, string))
11 will be printed