问题
In Desiginin File Formats link that i've gotten from this website, i've noticed that png has CRLF\x1A\LF
chunk that is ment for "testing" Carriage return and line feeder conversion.
I am building a custom binary structures for some project and i am wondering why is this useful, and in which scenario i should think about adding it ?
回答1:
Historically caused, different OSes uses distinct sequences to mark line endings in text files:
- Unix and companions
\n
(linefeed) - DOS and Windows
\r\n
(carriage-return, linefeed) - Mac OS (before Mac OS X)
\r
(carriage-return) (Mac OS X (which got a BSD Unix kernel) might support both: A Line Break Is a Line Break).
This is all a mess, e.g.:
- Sometimes Windows text files look a bit strange in Xemacs with all lines decorated with a
^M
at line end. - Windows Notepad (the included plain text editor) shows Linux text files in one line only.
Once, you switch periodically between different OSes, you start to get used that line-endings has to be fixed from time to time. There are numerous helper tools for this e.g. unix2dos
and dos2unix
in cygwin, special commands in Notepad++, prompts in VisualStudio, etc.
In C, a line-ending is always remarked by \n
even in DOS and Windows. (I have no experience with Mac OS but I would wonder if it isn't the same there.) To make this working seemlessly, MS decided to "fix" file contents in reading and writing "under the hood". While reading a file, all occurrences of \r\n
are replaced silently by \n
while file writing inserts a \r
before each written \n
.
This has some annoying drawbacks:
If a file of certain size is read, the "received" contents might be some bytes smaller. (I once stumbled over this when I tried to reserve space prior file loading and reading the whole contents at once. I wondered why some bytes seemed to be missing after loading.)
This may break loading of binary files where
\n
simply represents a binary value of 10 with any meaning (beyond line break).
To fix this, the C API provides additional modes for file I/O. E.g. fopen()
supports beyond r
, w
, and a
, an extra character to indicate file type
b
denotes binary I/O (don't touch contents)t
denotes text I/O (fix line-endings).
Without any of them, the default is text I/O.
On Windows as well as for portable file I/O, this should be always given. (On Linux, it simply doesn't have any effect especially no damaging.)
I once wrote an answer to SO: Copying a bmp in c where a dump of a broken BMP file illustrated the effect of wrong done file output nicely.
After this long story about text and binary file I/O, it might be obvious that it is always a potential issue for developers dealing with image data (which is usually encoded binary).
Hence, I can imagine that the \r\n\032\n
sequence is simply a test pattern for this. If these 4 bytes don't have exactly these values chances are good that
- file is opened with wrong mode (on a platform where this is relevant) or
- a previous tool damaged contents of the file.
To cite PeteBlackerThe3rd:
It will allow the decoder to throw useful error messages in that case as opposed to failing mysteriously.
来源:https://stackoverflow.com/questions/56951452/value-of-crlf-cr-chunk-in-png