Search for string allowing for one mismatch in any location of the string

后端未结

关注

 13  932

闹比i 2020-11-30 02:45

I am working with DNA sequences of length 25 (see examples below). I have a list of 230,000 and need to look for each sequence in the entire genome (toxoplasma gondii parasi

13条回答

萌比男神i (楼主)

2020-11-30 03:08

You can use regex matching library TRE, for "approximate matching". It also has bindings for Python, Perl and Haskell.

import tre

pt = tre.compile("Don(ald)?( Ervin)? Knuth", tre.EXTENDED)
data = """
In addition to fundamental contributions in several branches of
theoretical computer science, Donnald Erwin Kuth is the creator of
the TeX computer typesetting system, the related METAFONT font
definition language and rendering system, and the Computer Modern
family of typefaces.
"""

fz = tre.Fuzzyness(maxerr = 3)
print fz
m = pt.search(data, fz)

if m:
    print m.groups()
    print m[0]

which will output

tre.Fuzzyness(delcost=1,inscost=1,maxcost=2147483647,subcost=1, maxdel=2147483647,maxerr=3,maxins=2147483647,maxsub=2147483647)
((95, 113), (99, 108), (102, 108))
Donnald Erwin Kuth

http://en.wikipedia.org/wiki/TRE_%28computing%29

http://laurikari.net/tre/

0 讨论(0)

查看其它13个回答