I have two lists of unicode strings, one containing words picked up from a text file, another containing a list of sound file names from a directory, stripped from their ext
Some Unicode characters can be specified different ways, as you've discovered, either as a single codepoint or as a regular codepoint plus a combining codepoint. The character \u0300 is a COMBINING GRAVE ACCENT, which adds an accent mark to the preceding character.
The process of fixing a string to a common representation is called normalization. You can use the unicodedata module to do this:
def n(str):
return unicodedata.normalize('NFKC', str)
>>> n(u'ch\xe0o') == n(u'cha\u0300o')
True