Correctly decoding zip entry file names — CP437, UTF-8 or?

前端 未结 2 1250
無奈伤痛
無奈伤痛 2020-12-19 01:43

I recently wrote a zip file I/O library called zipzap, but I\'m struggling with correctly decoding zip entry file names from arbitrary zip files.

Now, the PKWARE spe

2条回答
  •  情深已故
    2020-12-19 02:28

    At the moment situation is as following:

    • most of Windows implementations use DOS (OEM) encoding
    • Mac OS zip utility uses utf-8, but it doesn't set utf-8 bit flags
    • *nix zip utilities silently uses system encoding

    So the only way is to check if filename contains something like utf-8 characters (check description of utf8 encoding - first byte should be 110xxxxx, second - 10xxxxxx for 2-bytes encoded chars). If it is correct utf8 string - use utf8 encoding. If not - fall back to OEM/DOS encoding.

提交回复
热议问题