UTF8 Filenames in PHP and Different Unicode Encodings

后端 未结 3 1158
日久生厌
日久生厌 2020-12-04 02:06

I have a file containing Unicode characters on a server running linux. If I SSH into the server and use tab-completion to navigate to the file/folder containing unicode char

3条回答
  •  -上瘾入骨i
    2020-12-04 02:45

    Thanks to the tips given in the two answers I was able to poke around and find some methods for normalizing the different unicode decompositions of a given character. In the situation I was faced with I was accessing files created by a OS X Carbon application. It is a fairly popular application and thus its file names seemed to adhere to a specific unicode decomposition.

    In PHP 5.3 a new set of functions was introduced that allows you to normalize a unicode string to a particular decomposition. Apparently there are four decomposition standards which you can decompose you unicode string into. Python has had unicode normalization capabilties since version 2.3 via unicode.normalize. This article on python's handling of unicode strings was helpful in understanding encoding / string handling a bit better.

    Here is a quick example on normalizing a unicode filepath:

    filePath = unicodedata.normalize('NFD', filePath)
    

    I found that the NFD format worked for all my purposes, I wonder if this is this is the standard decomposition for unicode filenames.

提交回复
热议问题