发表新帖

发表新帖

Determining binary/text file type in Java?

前端未结

关注

 10  1118

心在旅途 2020-12-02 16:46

Namely, how would you tell an archive (jar/rar/etc.) file from a textual (xml/txt, encoding-independent) one?

10条回答

遥遥无期 (楼主)

2020-12-02 17:40

If the file consists of the bytes 0x09 (tab), 0x0A (line feed), 0x0C (form feed), 0x0D (carriage return), or 0x20 through 0x7E, then it's probably ASCII text.

If the file contains any other ASCII control character, 0x00 through 0x1F excluding the three above, then it's probably binary data.

UTF-8 text follows a very specific pattern for any bytes with the high order bit, but fixed-length encodings like ISO-8859-1 do not. UTF-16 can frequently contain the null byte (0x00), but only in every other position.

You'd need a weaker heuristic for anything else.

0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...

热议问题