Find the similarity metric between two strings

前端 未结 11 1929
长情又很酷
长情又很酷 2020-11-22 13:24

How do I get the probability of a string being similar to another string in Python?

I want to get a decimal value like 0.9 (meaning 90%) etc. Preferably with standar

11条回答
  •  执笔经年
    2020-11-22 14:07

    Note, difflib.SequenceMatcher only finds the longest contiguous matching subsequence, this is often not what is desired, for example:

    >>> a1 = "Apple"
    >>> a2 = "Appel"
    >>> a1 *= 50
    >>> a2 *= 50
    >>> SequenceMatcher(None, a1, a2).ratio()
    0.012  # very low
    >>> SequenceMatcher(None, a1, a2).get_matching_blocks()
    [Match(a=0, b=0, size=3), Match(a=250, b=250, size=0)]  # only the first block is recorded
    

    Finding the similarity between two strings is closely related to the concept of pairwise sequence alignment in bioinformatics. There are many dedicated libraries for this including biopython. This example implements the Needleman Wunsch algorithm:

    >>> from Bio.Align import PairwiseAligner
    >>> aligner = PairwiseAligner()
    >>> aligner.score(a1, a2)
    200.0
    >>> aligner.algorithm
    'Needleman-Wunsch'
    

    Using biopython or another bioinformatics package is more flexible than any part of the python standard library since many different scoring schemes and algorithms are available. Also, you can actually get the matching sequences to visualise what is happening:

    >>> alignment = next(aligner.align(a1, a2))
    >>> alignment.score
    200.0
    >>> print(alignment)
    Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-Apple-
    |||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-|||-|-
    App-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-elApp-el
    

提交回复
热议问题