Reading Japanese filenames in windows, using Python and glob not working

孤街醉人 提交于 2020-06-25 07:05:27

问题


I just setup PortablePython on my system, so I can run python scripts from PHP and I got some very basic code (Below) to list all the files in a directory, however it doesn't work with Japanese filenames. It works fine with English filenames, but it spits out errors (Below) when I put any file containing Japanese characters in the directory.

import os, glob

path = 'G:\path'
for infile in glob.glob( os.path.join(path, '*') ):
    print("current file is: ", infile)

It works fine using 'PyScripter-Portable.exe', however when I try to run 'PortablePython\App\python.exe "test.py"' in the command prompt or from PHP it spits out the following errors:

current file is:  Traceback (most recent call last):
  File "test.py", line 5, in <module>
    print("current file is: ", infile)
  File "PortablePython\App\lib\io.py", line 1494, in write
    b = encoder.encode(s)
  File "PortablePython\App\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 37-40: character maps to <undefined>



I'm very new to Python and am just using this to get around a PHP issue with not being able to read unicode filenames in Windows... So I really need this to work - any help you can give me would be great.


回答1:


Assuming you're using python 2.x, try changing your strings to unicode, like this:

path = u'G:\path'
for infile in glob.glob( os.path.join(path, u'*') ):
    print( u"current file is: ", infile)

That should let python's filesystem-related functions know that you want to work with unicode file names.




回答2:


The problem is probably that whatever output destination you're printing to doesn't use the same encoding as the file system. The general rule is that you should get text into Unicode as soon as possible, and then convert to whatever byte encoding you need upon output (e.g. utf-8).

Since you're dealing with filenames, they should be in the system encoding.

import sys
fse = sys.getfilesystemencoding()
filenames = [unicode(x, fse) for x in glob.glob( os.path.join(path, '*') )]

Now all your filenames are Unicode, and you need to figure out the correct encoding to output from the command prompt or whatever (you can launch a Unicode version of the command prompt with the u flag: "cmd /u")



来源:https://stackoverflow.com/questions/3077752/reading-japanese-filenames-in-windows-using-python-and-glob-not-working

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!