Having a bit of struggle with Unicode file names in OS X and Python. I am trying to use filenames as input for a regular expression later in the code, but the encoding used
getfilesystemencoding() is giving you the correct response (the encoding), but it does not tell you the unicode normalisation form.
In particular, the HFS+ filesystem uses UTF-8 encoding, and a normalisation form close to "D" (which requires composed characters like ö to be decomposed into o¨). HFS+ is also tied to the normalisation form as it existed in Unicode version 3.2—as detailed in Apple's documentation for the HFS+ format.
Python's unicodedata.normalize method converts between forms, and if you prefix the call with the ucd_3_2_0 object, you can constrain it to Unicode version 3.2:
filename = unicodedata.ucd_3_2_0.normalize('NFC', unicode(filename, 'utf-8')).encode('utf-8')