Search for string allowing for one mismatch in any location of the string

后端 未结 13 924
闹比i
闹比i 2020-11-30 02:45

I am working with DNA sequences of length 25 (see examples below). I have a list of 230,000 and need to look for each sequence in the entire genome (toxoplasma gondii parasi

13条回答
  •  轻奢々
    轻奢々 (楼主)
    2020-11-30 03:22

    Python regex library supports fuzzy regular expression matching. One advantage over TRE is that it allows to find all matches of regular expression in the text (supports overlapping matches as well).

    import regex
    m=regex.findall("AA", "CAG")
    >>> []
    m=regex.findall("(AA){e<=1}", "CAAG") # means allow up to 1 error
    m
    >>> ['CA', 'AG']
    

提交回复
热议问题