Python - how to convert unicode filename to CP437?

▼魔方 西西 提交于 2019-12-10 00:12:51

问题


I have a file that has a Unicode name, say 'קובץ.txt'. I want to pack him, and I'm using python's zipfile.

I can zip the files and open them later on with a problem except that file names are messed up when using windows 7 file explorer to view the files (7zip works great).

According to the docs, this is a common problem, and there are instructions on how to deal with that:

From ZipFile.write

Note

There is no official file name encoding for ZIP files. If you have unicode file names, you must convert them to byte strings in your desired encoding before passing them to write(). WinZip interprets all file names as encoded in CP437, also known as DOS Latin.

Sorry, but I can't seem to get what exactly am I supposed to do with the filename. I've tried .encode('CP437'), .decode('CP437')..


回答1:


You'd have to encode your Unicode string to CP437. However, you can't encode your specific example because the CP437 codec does not support Hebrew:

>>> u'קובץ.txt'.encode('cp437')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mjpieters/Development/venvs/stackoverflow-2.7/lib/python2.7/encodings/cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-3: character maps to <undefined>

The above error tells you that the first 4 characters (קובץ) cannot be encoded because there are no such characters in the target characterset. CP437 only supports the western alphabet (A-Z, and accented characters like ç and é), IBM line drawing characters (such as ╚ and ┤) and a few greek symbols, mainly for math equations (such as Σ and φ).

You'll either have to generate a different filename that only uses characters supported by the CP437 codec or live with the fact that WinZip will never be able to show Hebrew filenames properly, and simply stick with the characterset that did work for you with 7zip.




回答2:


try this

import zipfile
p=b'\xd7\xa7\xd7\x95\xd7\x91\xd7\xa5.txt'.decode('utf8')
# or just:
# p='קובץ.txt'
z=zipfile.ZipFile('test.zip','w')
f=z.open(p.encode('utf8').decode('cp437'),'w')
f.write(b'hello world')
f.close()
z.close()

I've tried on a MacOSX, so it's not cp437 above, but utf8, and it works

I hope this works on windows

I've tested reading Chinese filenames with "gbk" or "gb18030" encoding with similar codes. And it works well.

When you have a zip archive from (or needs to send it to) Mac/Linux, change cp437 in the code to utf8 and everything works

When you have a zip archive from (or needs to send it to) Windows, leave cp437 unchanged



来源:https://stackoverflow.com/questions/33941838/python-how-to-convert-unicode-filename-to-cp437

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!