This is the behaviour I want:
a: IGADKYFHARGNYDAA
c: KGADKYFHARGNYEAA
2 difference(s).
When looping through one string, make a counter object that identifies the letter you are on at each iteration. Then use this counter as an index to refer to the other sequence.
a = 'IGADKYFHARGNYDAA'
b = 'KGADKYFHARGNYEAA'
counter = 0
differences = 0
for i in a:
if i != b[counter]:
differences += 1
counter += 1
Here, each time we come across a letter in sequence a that differs from the letter at the same position in sequence b, we add 1 to 'differences'. We then add 1 to the counter before we move onto the next letter.
Here is my solution. This compares 2 strings, it doesn't matter what you put in A or B.
#Declare Variables
a='Here is my first string'
b='Here is my second string'
notTheSame=0
count=0
#Check which string is bigger and put the bigger string in C and smaller string in D
if len(a) >= len(b):
c=a
d=b
if len(b) > len(a):
d=a
c=b
#While the counter is less than the length of the longest string, compare each letter.
while count < len(c):
if count == len(d):
break
if c[count] != d[count]:
print(c[count] + " not equal to " + d[count])
notTheSame = notTheSame + 1
else:
print(c[count] + " is equal to " + d[count])
count=count+1
#the below output is a count of all the differences + the difference between the 2 strings
print("Number of Differences: " + str(len(c)-len(d)+notTheSame))
I like the answer from Niklas R, but it has an issue (depending on your expectations). Using the answer with the following two test cases:
print compare('berry','peach')
print compare('berry','cherry')
We may reasonable expect cherry to be more similar to berry than to peach. Yet the we get a lower diff between berry and peach, then berry and cherry:
(' | ', 4) # berry, peach
(' | ', 5) # berry, cherry
This occurs when strings are more similar backwards, than forwards. To extend the answer from answer from Niklas R, we can add a helper function which returns the minimum diff between the normal (forwards) diff and a diff of the reversed strings:
def fuzzy_compare(string1, string2):
(fwd_result, fwd_diff) = compare(string1, string2)
(rev_result, rev_diff) = compare(string1[::-1], string2[::-1])
diff = min(fwd_diff, rev_diff)
return diff
Use the following test cases again:
print fuzzy_compare('berry','peach')
print fuzzy_compare('berry','cherry')
...and we get
4 # berry, peach
2 # berry, cherry
As I said, this really just extends, rather than modifies the answer from Niklas R.
If you're just looking for a simple diff function (taking into consideration the aforementioned gotcha), the following will do:
def diff(a, b):
delta = do_diff(a, b)
delta_rev = do_diff(a[::-1], b[::-1])
return min(delta, delta_rev)
def do_diff(a,b):
delta = 0
i = 0
while i < len(a) and i < len(b):
delta += a[i] != b[i]
i += 1
delta += len(a[i:]) + len(b[i:])
return delta
Test cases:
print diff('berry','peach')
print diff('berry','cherry')
One final consideration is of the diff function itself when handling words of different lengths. There are two options:
For example:
When considering only the shortest word we can use:
def do_diff_shortest(a,b):
delta, i = 0, 0
if len(a) > len(b):
a, b = b, a
for i in range(len(a)):
delta += a[i] != b[i]
return delta
...the number of iterations is dictated by the shortest word, everything else is ignored. Or we can take into consideration different lengths:
def do_diff_both(a, b):
delta, i = 0, 0
while i < len(a) and i < len(b):
delta += a[i] != b[i]
i += 1
delta += len(a[i:]) + len(b[i:])
return delta
In this example, any remaining characters are counted and added to the diff value. To test both functions
print do_diff_shortest('apple','apples')
print do_diff_both('apple','apples')
Will output:
0 # Ignore extra characters belonging to longest word.
1 # Consider extra characters.
Here is my solution to a similar problem comparing two strings based on the solution presented here: https://stackoverflow.com/a/12226960/3542145 .
Since itertools.izip did not work for me in Python3, I found the solution which simply uses the zip function instead: https://stackoverflow.com/a/32303142/3542145 .
The function to compare the two strings:
def compare(string1, string2, no_match_c=' ', match_c='|'):
if len(string2) < len(string1):
string1, string2 = string2, string1
result = ''
n_diff = 0
for c1, c2 in zip(string1, string2):
if c1 == c2:
result += match_c
else:
result += no_match_c
n_diff += 1
delta = len(string2) - len(string1)
result += delta * no_match_c
n_diff += delta
return (result, n_diff)
Setup the two strings for comparison and call the function:
def main():
string1 = 'AAUAAA'
string2 = 'AAUCAA'
result, n_diff = compare(string1, string2, no_match_c='_')
print("%d difference(s)." % n_diff)
print(string1)
print(result)
print(string2)
main()
Which returns:
1 difference(s).
AAUAAA
|||_||
AAUCAA
I think this example will work for your specific case without too much hassle and without hitting interoperability issues with your python software version (upgrade to 2.7 please):
a='IGADKYFHARGNYDAA'
b='KGADKYFHARGNYEAA'
u=zip(a,b)
d=dict(u)
x=[]
for i,j in d.items():
if i==j:
x.append('*')
else:
x.append(j)
print x
Outputs: ['*', 'E', '*', '*', 'K', '*', '*', '*', '*', '*']
With a few tweaks, you can get what you want....Tell me if it helps :-)
Update
You can also use this:
a='IGADKYFHARGNYDAA'
b='KGADKYFHARGNYEAA'
u=zip(a,b)
for i,j in u:
if i==j:
print i,'--',j
else:
print i,' ',j
Outputs:
I K
G -- G
A -- A
D -- D
K -- K
Y -- Y
F -- F
H -- H
A -- A
R -- R
G -- G
N -- N
Y -- Y
D E
A -- A
A -- A
Update 2
You may modify the code like this:
y=[]
counter=0
for i,j in u:
if i==j:
print i,'--',j
else:
y.append(j)
print i,' ',j
print '\n', y
print '\n Length = ',len(y)
Outputs:
I K
G -- G
A -- A
D -- D
K -- K
Y -- Y
F -- F
H -- H
A -- A
R -- R
G -- G
N -- N
Y -- Y
D E
A -- A
A X
['K', 'E', 'X']
Length = 3