Count letter differences of two strings

后端 未结 11 682
既然无缘
既然无缘 2020-12-09 17:11

This is the behaviour I want:

a: IGADKYFHARGNYDAA
c: KGADKYFHARGNYEAA
2 difference(s).
相关标签:
11条回答
  • 2020-12-09 17:18

    Python has the excellent difflib, which should provide the needed functionnality.

    Here's sample usage from the documentation:

    import difflib  # Works for python >= 2.1
    
    >>> s = difflib.SequenceMatcher(lambda x: x == " ",
    ...                     "private Thread currentThread;",
    ...                     "private volatile Thread currentThread;")
    >>> for block in s.get_matching_blocks():
    ...     print "a[%d] and b[%d] match for %d elements" % block
    a[0] and b[0] match for 8 elements
    a[8] and b[17] match for 21 elements
    a[29] and b[38] match for 0 elements    
    
    0 讨论(0)
  • 2020-12-09 17:20
    def diff_letters(a,b):
        return sum ( a[i] != b[i] for i in range(len(a)) )
    
    0 讨论(0)
  • 2020-12-09 17:26

    I haven't seen anyone use the reduce function, so I'll include a piece of code I've been using:

    reduce(lambda x, y: x + 1 if y[0] != y[1] else x, zip(source, target), 0)
    

    which will give you the number of differing characters in source and target

    0 讨论(0)
  • 2020-12-09 17:27

    With difflib.ndiff you can do this in a one-liner that's still somewhat comprehensible:

    >>> import difflib
    >>> a = 'IGADKYFHARGNYDAA'
    >>> c = 'KGADKYFHARGNYEAA'
    >>> sum([i[0] != ' '  for i in difflib.ndiff(a, c)]) / 2
    2
    

    (sum works here because, well, kind of True == 1 and False == 0)

    The following makes it clear what's happening and why the / 2 is needed:

    >>> [i for i in difflib.ndiff(a,c)]
    ['- I',
     '+ K',
     '  G',
     '  A',
     '  D',
     '  K',
     '  Y',
     '  F',
     '  H',
     '  A',
     '  R',
     '  G',
     '  N',
     '  Y',
     '- D',
     '+ E',
     '  A',
     '  A']
    

    This also works well if the strings have a different length.

    0 讨论(0)
  • 2020-12-09 17:30

    The Theory

    1. Iterate over both strings simultaneously and compare the characters.
    2. Store the result with a new string by adding either a spacebar or a | character to it, respectively. Also, increase a integer-value starting from zero for each different character.
    3. Output the result.

    Implementation

    You can use the built-in zip function or itertools.izip to simultaneously iterate over both strings, while the latter is a little more performant in case of huge input. If the strings are not of the same size, iteration will only happen for the shorter-part. If this is the case, you can fill up the rest with the no-match indicating character.

    import itertools
    
    def compare(string1, string2, no_match_c=' ', match_c='|'):
        if len(string2) < len(string1):
            string1, string2 = string2, string1
        result = ''
        n_diff = 0
        for c1, c2 in itertools.izip(string1, string2):
            if c1 == c2:
                result += match_c
            else:
                result += no_match_c
                n_diff += 1
        delta = len(string2) - len(string1)
        result += delta * no_match_c
        n_diff += delta
        return (result, n_diff)
    

    Example

    Here's a simple test, with slightly different options than from your example above. Note that I have used an underscore for indicating non-matching characters to better demonstrate how the resulting string is expanded to the size of the longer string.

    def main():
        string1 = 'IGADKYFHARGNYDAA AWOOH'
        string2 = 'KGADKYFHARGNYEAA  W'
        result, n_diff = compare(string1, string2, no_match_c='_')
    
        print "%d difference(s)." % n_diff  
        print string1
        print result
        print string2
    
    main()
    

    Output:

    niklas@saphire:~/Desktop$ python foo.py 
    6 difference(s).
    IGADKYFHARGNYDAA AWOOH
    _||||||||||||_|||_|___
    KGADKYFHARGNYEAA  W
    
    0 讨论(0)
  • 2020-12-09 17:32
    a = "IGADKYFHARGNYDAA" 
    b = "KGADKYFHARGNYEAAXXX"
    match_pattern = zip(a, b)                                 #give list of tuples (of letters at each index)
    difference = sum (1 for e in zipped if e[0] != e[1])     #count tuples with non matching elements
    difference = difference + abs(len(a) - len(b))            #in case the two string are of different lenght, we add the lenght difference
    
    0 讨论(0)
提交回复
热议问题