How do I decide which way to backtrack in the Smith–Waterman algorithm?

倾然丶 夕夏残阳落幕 提交于 2019-12-05 13:39:41

When you build the similarity matrix, you need to store not only the similarity score, but where that score came from. You currently have a line of code:

p[i][j]=max(0,vertical_score,horizontal_score,diagonal_score);

so here you need to remember not the just the maximum score, but which of these was the maximum. Then when you come to do the backtracking you will know which direction to go.

For example, you might try something like this:

import numpy

DELETION, INSERTION, MATCH = range(3)

def smith_waterman(seq1, seq2, insertion_penalty = -1, deletion_penalty = -1,
                   mismatch_penalty = -1, match_score = 2):
    """
    Find the optimum local sequence alignment for the sequences `seq1`
    and `seq2` using the Smith-Waterman algorithm. Optional keyword
    arguments give the gap-scoring scheme:

    `insertion_penalty` penalty for an insertion (default: -1)
    `deletion_penalty`  penalty for a deletion (default: -1)
    `mismatch_penalty`  penalty for a mismatch (default: -1)
    `match_score`       score for a match (default: 2)

    See <http://en.wikipedia.org/wiki/Smith-Waterman_algorithm>.

    >>> for s in smith_waterman('AGCAGACT', 'ACACACTA'): print s
    ... 
    AGCAGACT-
    A-CACACTA
    """
    m, n = len(seq1), len(seq2)

    # Construct the similarity matrix in p[i][j], and remember how
    # we constructed it -- insertion, deletion or (mis)match -- in
    # q[i][j].
    p = numpy.zeros((m + 1, n + 1))
    q = numpy.zeros((m + 1, n + 1))
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            deletion = (p[i - 1][j] + deletion_penalty, DELETION)
            insertion = (p[i][j - 1] + insertion_penalty, INSERTION)
            if seq1[i - 1] == seq2[j - 1]:
                match = (p[i - 1][j - 1] + match_score, MATCH)
            else:
                match = (p[i - 1][j - 1] + mismatch_penalty, MATCH)
            p[i][j], q[i][j] = max((0, 0), deletion, insertion, match)

    # Yield the aligned sequences one character at a time in reverse
    # order.
    def backtrack():
        i, j = m, n
        while i > 0 or j > 0:
            assert i >= 0 and j >= 0
            if q[i][j] == MATCH:
                i -= 1
                j -= 1
                yield seq1[i], seq2[j]
            elif q[i][j] == INSERTION:
                j -= 1
                yield '-', seq2[j]
            elif q[i][j] == DELETION:
                i -= 1
                yield seq1[i], '-'
            else:
                assert(False)

    return [''.join(reversed(s)) for s in zip(*backtrack())]
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!