Search for string allowing for one mismatch in any location of the string
问题 I am working with DNA sequences of length 25 (see examples below). I have a list of 230,000 and need to look for each sequence in the entire genome (toxoplasma gondii parasite). I am not sure how large the genome is, but much longer than 230,000 sequences. I need to look for each of my sequences of 25 characters, for example, (AGCCTCCCATGATTGAACAGATCAT). The genome is formatted as a continuous string, i.e.