Suppose you take the strings \'a\' and \'z\' and list all the strings that come between them in alphabetical order: [\'a\',\'b\',\'c\' ... \'x\',\'y\',\'z\']. Take the midpoint
If you define an alphabet of characters, you can just convert to base 10, do an average, and convert back to base-N where N is the size of the alphabet.
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def enbase(x):
n = len(alphabet)
if x < n:
return alphabet[x]
return enbase(x/n) + alphabet[x%n]
def debase(x):
n = len(alphabet)
result = 0
for i, c in enumerate(reversed(x)):
result += alphabet.index(c) * (n**i)
return result
def average(a, b):
a = debase(a)
b = debase(b)
return enbase((a + b) / 2)
print average('a', 'z') #m
print average('aa', 'zz') #mz
print average('cat', 'doggie') #budeel
print average('google', 'microsoft') #gebmbqkil
print average('microsoft', 'google') #gebmbqkil
Edit: Based on comments and other answers, you might want to handle strings of different lengths by appending the first letter of the alphabet to the shorter word until they're the same length. This will result in the "average" falling between the two inputs in a lexicographical sort. Code changes and new outputs below.
def pad(x, n):
p = alphabet[0] * (n - len(x))
return '%s%s' % (x, p)
def average(a, b):
n = max(len(a), len(b))
a = debase(pad(a, n))
b = debase(pad(b, n))
return enbase((a + b) / 2)
print average('a', 'z') #m
print average('aa', 'zz') #mz
print average('aa', 'az') #m (equivalent to ma)
print average('cat', 'doggie') #cumqec
print average('google', 'microsoft') #jlilzyhcw
print average('microsoft', 'google') #jlilzyhcw