Value of CRLF CR chunk in png

放肆的年华 提交于 2020-01-04 03:18:30

问题


In Desiginin File Formats link that i've gotten from this website, i've noticed that png has CRLF\x1A\LF chunk that is ment for "testing" Carriage return and line feeder conversion.

I am building a custom binary structures for some project and i am wondering why is this useful, and in which scenario i should think about adding it ?


回答1:


Historically caused, different OSes uses distinct sequences to mark line endings in text files:

  • Unix and companions \n (linefeed)
  • DOS and Windows \r\n (carriage-return, linefeed)
  • Mac OS (before Mac OS X) \r (carriage-return) (Mac OS X (which got a BSD Unix kernel) might support both: A Line Break Is a Line Break).

This is all a mess, e.g.:

  • Sometimes Windows text files look a bit strange in Xemacs with all lines decorated with a ^M at line end.
  • Windows Notepad (the included plain text editor) shows Linux text files in one line only.

Once, you switch periodically between different OSes, you start to get used that line-endings has to be fixed from time to time. There are numerous helper tools for this e.g. unix2dos and dos2unix in cygwin, special commands in Notepad++, prompts in VisualStudio, etc.

In C, a line-ending is always remarked by \n even in DOS and Windows. (I have no experience with Mac OS but I would wonder if it isn't the same there.) To make this working seemlessly, MS decided to "fix" file contents in reading and writing "under the hood". While reading a file, all occurrences of \r\n are replaced silently by \n while file writing inserts a \r before each written \n.

This has some annoying drawbacks:

  1. If a file of certain size is read, the "received" contents might be some bytes smaller. (I once stumbled over this when I tried to reserve space prior file loading and reading the whole contents at once. I wondered why some bytes seemed to be missing after loading.)

  2. This may break loading of binary files where \n simply represents a binary value of 10 with any meaning (beyond line break).

To fix this, the C API provides additional modes for file I/O. E.g. fopen() supports beyond r, w, and a, an extra character to indicate file type

  • b denotes binary I/O (don't touch contents)
  • t denotes text I/O (fix line-endings).

Without any of them, the default is text I/O.

On Windows as well as for portable file I/O, this should be always given. (On Linux, it simply doesn't have any effect especially no damaging.)

I once wrote an answer to SO: Copying a bmp in c where a dump of a broken BMP file illustrated the effect of wrong done file output nicely.

After this long story about text and binary file I/O, it might be obvious that it is always a potential issue for developers dealing with image data (which is usually encoded binary).

Hence, I can imagine that the \r\n\032\n sequence is simply a test pattern for this. If these 4 bytes don't have exactly these values chances are good that

  • file is opened with wrong mode (on a platform where this is relevant) or
  • a previous tool damaged contents of the file.

To cite PeteBlackerThe3rd:

It will allow the decoder to throw useful error messages in that case as opposed to failing mysteriously.



来源:https://stackoverflow.com/questions/56951452/value-of-crlf-cr-chunk-in-png

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!