Count letter differences of two strings

后端 未结 11 684
既然无缘
既然无缘 2020-12-09 17:11

This is the behaviour I want:

a: IGADKYFHARGNYDAA
c: KGADKYFHARGNYEAA
2 difference(s).
相关标签:
11条回答
  • 2020-12-09 17:33

    When looping through one string, make a counter object that identifies the letter you are on at each iteration. Then use this counter as an index to refer to the other sequence.

    a = 'IGADKYFHARGNYDAA'
    b = 'KGADKYFHARGNYEAA'
    
    counter = 0
    differences = 0
    for i in a:
        if i != b[counter]:
            differences += 1
        counter += 1
    

    Here, each time we come across a letter in sequence a that differs from the letter at the same position in sequence b, we add 1 to 'differences'. We then add 1 to the counter before we move onto the next letter.

    0 讨论(0)
  • 2020-12-09 17:35

    Here is my solution. This compares 2 strings, it doesn't matter what you put in A or B.

    #Declare Variables
    a='Here is my first string'
    b='Here is my second string'
    notTheSame=0
    count=0
    
    #Check which string is bigger and put the bigger string in C and smaller string in D
    if len(a) >= len(b):
        c=a
        d=b
    if len(b) > len(a):
        d=a
        c=b
    
    #While the counter is less than the length of the longest string, compare each letter.
    while count < len(c):
        if count == len(d):
            break
        if c[count] != d[count]:
            print(c[count] + " not equal to " + d[count])
            notTheSame = notTheSame + 1
        else:
            print(c[count] + " is equal to " + d[count])
        count=count+1
    
    #the below output is a count of all the differences + the difference between the 2 strings
    print("Number of Differences: " + str(len(c)-len(d)+notTheSame))
    
    0 讨论(0)
  • 2020-12-09 17:37

    I like the answer from Niklas R, but it has an issue (depending on your expectations). Using the answer with the following two test cases:

    print compare('berry','peach')
    print compare('berry','cherry')
    

    We may reasonable expect cherry to be more similar to berry than to peach. Yet the we get a lower diff between berry and peach, then berry and cherry:

    (' |   ', 4)  # berry, peach
    ('   |  ', 5) # berry, cherry
    

    This occurs when strings are more similar backwards, than forwards. To extend the answer from answer from Niklas R, we can add a helper function which returns the minimum diff between the normal (forwards) diff and a diff of the reversed strings:

    def fuzzy_compare(string1, string2):
        (fwd_result, fwd_diff) = compare(string1, string2)
        (rev_result, rev_diff) = compare(string1[::-1], string2[::-1])
        diff = min(fwd_diff, rev_diff)
        return diff
    

    Use the following test cases again:

    print fuzzy_compare('berry','peach')
    print fuzzy_compare('berry','cherry')
    

    ...and we get

    4 # berry, peach
    2 # berry, cherry
    

    As I said, this really just extends, rather than modifies the answer from Niklas R.

    If you're just looking for a simple diff function (taking into consideration the aforementioned gotcha), the following will do:

    def diff(a, b):
        delta = do_diff(a, b)
        delta_rev = do_diff(a[::-1], b[::-1])
        return min(delta, delta_rev)
    
    def do_diff(a,b):
        delta = 0
        i = 0
        while i < len(a) and i < len(b):
            delta += a[i] != b[i]
            i += 1
        delta += len(a[i:]) + len(b[i:])
        return delta
    

    Test cases:

    print diff('berry','peach')
    print diff('berry','cherry')
    

    One final consideration is of the diff function itself when handling words of different lengths. There are two options:

    1. Consider the difference between lengths as difference characters.
    2. Ignore the difference in length and compare only shortest word.

    For example:

    • apple and apples have a difference of 1 when considering all characters.
    • apple and apples have a difference of 0 when considering only the shortest word

    When considering only the shortest word we can use:

    def do_diff_shortest(a,b):
        delta, i = 0, 0
        if len(a) > len(b):
            a, b = b, a
        for i in range(len(a)):
            delta += a[i] != b[i]
        return delta
    

    ...the number of iterations is dictated by the shortest word, everything else is ignored. Or we can take into consideration different lengths:

    def do_diff_both(a, b):
        delta, i = 0, 0
        while i < len(a) and i < len(b):
            delta += a[i] != b[i]
            i += 1
        delta += len(a[i:]) + len(b[i:])
        return delta
    

    In this example, any remaining characters are counted and added to the diff value. To test both functions

    print do_diff_shortest('apple','apples')
    print do_diff_both('apple','apples')
    

    Will output:

    0 # Ignore extra characters belonging to longest word.
    1 # Consider extra characters.
    
    0 讨论(0)
  • 2020-12-09 17:38

    Here is my solution to a similar problem comparing two strings based on the solution presented here: https://stackoverflow.com/a/12226960/3542145 .

    Since itertools.izip did not work for me in Python3, I found the solution which simply uses the zip function instead: https://stackoverflow.com/a/32303142/3542145 .

    The function to compare the two strings:

    def compare(string1, string2, no_match_c=' ', match_c='|'):
        if len(string2) < len(string1):
            string1, string2 = string2, string1
        result = ''
        n_diff = 0
        for c1, c2 in zip(string1, string2):
            if c1 == c2:
                result += match_c
            else:
                result += no_match_c
                n_diff += 1
        delta = len(string2) - len(string1)
        result += delta * no_match_c
        n_diff += delta
        return (result, n_diff)
    

    Setup the two strings for comparison and call the function:

    def main():
        string1 = 'AAUAAA'
        string2 = 'AAUCAA'
        result, n_diff = compare(string1, string2, no_match_c='_')
        print("%d difference(s)." % n_diff)
        print(string1)
        print(result)
        print(string2)
    
    main()
    

    Which returns:

    1 difference(s).
    AAUAAA
    |||_||
    AAUCAA
    
    0 讨论(0)
  • 2020-12-09 17:44

    I think this example will work for your specific case without too much hassle and without hitting interoperability issues with your python software version (upgrade to 2.7 please):

    a='IGADKYFHARGNYDAA'
    b='KGADKYFHARGNYEAA'
    
    u=zip(a,b)
    d=dict(u)
    
    x=[]
    for i,j in d.items(): 
        if i==j:
            x.append('*') 
        else: 
            x.append(j)
            
    print x
    

    Outputs: ['*', 'E', '*', '*', 'K', '*', '*', '*', '*', '*']


    With a few tweaks, you can get what you want....Tell me if it helps :-)


    Update

    You can also use this:

    a='IGADKYFHARGNYDAA'
    b='KGADKYFHARGNYEAA'
    
    u=zip(a,b)
    for i,j in u:
        if i==j:
            print i,'--',j
        else: 
            print i,'  ',j
    

    Outputs:

    I    K
    G -- G
    A -- A
    D -- D
    K -- K
    Y -- Y
    F -- F
    H -- H
    A -- A
    R -- R
    G -- G
    N -- N
    Y -- Y
    D    E
    A -- A
    A -- A
    

    Update 2

    You may modify the code like this:

    y=[]
    counter=0
    for i,j in u:
        if i==j:
            print i,'--',j
        else: 
            y.append(j)
            print i,'  ',j
            
    print '\n', y
    
    print '\n Length = ',len(y)
    

    Outputs:

    I    K
    G -- G
    A -- A
    D -- D
    K -- K
    Y -- Y
    F -- F
    H -- H
    A -- A
    R -- R
    G -- G
    N -- N
    Y -- Y
    D    E
    A -- A
    A    X
    
    ['K', 'E', 'X']
    
     Length =  3
    
    0 讨论(0)
提交回复
热议问题