Unicode encoding for filesystem in Mac OS X not correct in Python?

后端 未结 2 744
暖寄归人
暖寄归人 2020-12-06 01:01

Having a bit of struggle with Unicode file names in OS X and Python. I am trying to use filenames as input for a regular expression later in the code, but the encoding used

2条回答
  •  生来不讨喜
    2020-12-06 01:31

    MacOS X uses a special kind of decomposed UTF-8 to store filenames. If you need to e.g. read in filenames and write them to a "normal" UTF-8 file, you must normalize them :

    filename = unicodedata.normalize('NFC', unicode(filename, 'utf-8')).encode('utf-8')
    

    from here: https://web.archive.org/web/20120423075412/http://boodebr.org/main/python/all-about-python-and-unicode

提交回复
热议问题