Search for string allowing for one mismatch in any location of the string

后端 未结 13 930
闹比i
闹比i 2020-11-30 02:45

I am working with DNA sequences of length 25 (see examples below). I have a list of 230,000 and need to look for each sequence in the entire genome (toxoplasma gondii parasi

13条回答
  •  一生所求
    2020-11-30 03:09

    This is quite old but perhaps this simple solution could work. loop through the sequence taking 25character slices. convert the slice to an numpy array. Compare to the 25char string (also as a numpy array). Sum the answer and if the answer is 24 print out the position in the loop and the mismatch.

    te next few lines show it working

    import numpy as np

    a = ['A','B','C']

    b = np.array(a)

    b

    array(['A', 'B', 'C'], dtype='

    c = ['A','D','C']

    d = np.array(c)

    b==d

    array([ True, False, True])

    sum(b==d)

    2

提交回复
热议问题