Add non-ASCII file names to zip in Java

后端 未结 7 1339
时光说笑
时光说笑 2020-12-14 08:38

What is the best way to add non-ASCII file names to a zip file using Java, in such a way that the files can be properly re

相关标签:
7条回答
  • 2020-12-14 09:14

    Did it actually fail or was just a font issue? (e.g. font having different glyphs for those charcodes) I've seen similar issues in Windows where rendering "broke" because the font didn't support the charset but the data was actually intact and correct.

    0 讨论(0)
  • 2020-12-14 09:15

    From a quick look at the TrueZIP manual - they recommend the JAR format:

    It uses UTF-8 for file name encoding and comments - unlike ZIP, which only uses IBM437.

    This probably means that the API is using the java.util.zip package for its implementation; that documentation states that it is still using a ZIP format from 1996. Unicode support wasn't added to the PKWARE .ZIP File Format Specification until 2006.

    0 讨论(0)
  • 2020-12-14 09:19

    Miracles indeed happen, and Sun/Oracle did really fix the long-living bug/rfe:

    Now it's possible to set up filename encodings upon creating the zip file/stream (requires Java 7).

    0 讨论(0)
  • 2020-12-14 09:22

    Non-ASCII file names are not reliable across ZIP implementations and are best avoided. There is no provision for storing a charset setting in ZIP files; clients tend to guess with 'the current system codepage', which is unlikely to be what you want. Many combinations of client and codepage can result in inaccessible files.

    Sorry!

    0 讨论(0)
  • 2020-12-14 09:26

    In Zip files, according to the spec owned by PKWare, the encoding of file names and file comments is IBM437. In 2007 PKWare extended the spec to also allow UTF-8. This says nothing about the encoding of the files contained within the zip. Only the encoding of the filenames.

    I think all tools and libraries (Java and non Java) support IBM437 (which is a superset of ASCII), and fewer tools and libraries support UTF-8. Some tools and libs support other code pages. For example if you zip something using WinRar on a computer running in Shanghai, you will get the Big5 code page. This is not "allowed" by the zip spec but it happens anyway.

    The DotNetZip library for .NET does Unicode, but of course that doesn't help you if you are using Java!

    Using the Java built-in support for ZIP, you will always get IBM437. If you want an archive with something other than IBM437, then use a third party library, or create a JAR.

    0 讨论(0)
  • 2020-12-14 09:31

    You can still use the Apache Commons implementation of the zip stream : http://commons.apache.org/compress/apidocs/org/apache/commons/compress/archivers/zip/ZipArchiveOutputStream.html#setEncoding%28java.lang.String%29

    Calling setEncoding("UTF-8") on your stream should be enough.

    0 讨论(0)
提交回复
热议问题