I am working with DNA sequences of length 25 (see examples below). I have a list of 230,000 and need to look for each sequence in the entire genome (toxoplasma gondii parasi
Python regex library supports fuzzy regular expression matching. One advantage over TRE is that it allows to find all matches of regular expression in the text (supports overlapping matches as well).
import regex
m=regex.findall("AA", "CAG")
>>> []
m=regex.findall("(AA){e<=1}", "CAAG") # means allow up to 1 error
m
>>> ['CA', 'AG']