why when we write \n in the file it converts into \r\n combination?

二次信任 提交于 2019-12-12 17:25:20

问题


I read this concept from book that when we attemp to write \n to the file using fputs(), fputs() converts the \n to \r\n combination and then if we read the same line back using fgets () the reverse conversion happens means \r\n back convert to \n. I don't get that what is the purpose behind this?


回答1:


It is because Windows (and MS-DOS) text files are supposed to have lines ending in \r\n, and portable C programs are supposed to simply use \n because C was originally defined on Unix.

And it's not just fputs and fgets that do it - any I/O function on a text file, even getc and fread, will do the same conversion.




回答2:


Succinctly, DOS is the reason for this.

Different systems have different conventions for line endings. Unix reckons one character, '\n', is sufficient to mark the end of a line. DOS decided that it needed two characters, '\r' and '\n', though other systems also used that convention. The versions of Mac OS 1-9 (prior to Mac OS X) used just '\r' instead. Other systems could use a count and the line data instead of a line ending, or could simulate punched cards with blanks up to a fixed length (72 or 80). Unix also doesn't distinguish between binary and text files; DOS does. (DOS also uses Control-Z to mark EOF in a text file. Unix doesn't have an EOF marker; it knows exactly how big the file is and uses that length to determine when it has reached EOF.)

C originate on Unix, but to make it easier to migrate code between the systems, the standard I/O package defined that when it was working on text files, the input side would convert a native line ending to the single '\n' character for uniform input, and the output side would convert a '\n' to the native line ending.

However, the mention of text files also meant that there needed to be binary files, where these mappings do not occur.

You might note that most of the internet protocols (HTTP, for example) mandate CRLF (carriage return, line feed, or '\r', '\n') for the end of line markers.

(Actually, blaming DOS, as in MS-DOS or PC-DOS, is a little unfair. There were other systems that used the CRLF line end convention before DOS existed, and they may have been more influential on the Internet. However, almost all those ancestral systems are substantially defunct, and Windows is the environment that you'll run into these days where the distinction between binary and text files matters, and where you'll encounter CRLF line endings.)

Note that the C standard has this to say about text files:

ISO/IEC 9899:2011 §7.21.2 Files

¶2 A text stream is an ordered sequence of characters composed into lines, each line consisting of zero or more characters plus a terminating new-line character. Whether the last line requires a terminating new-line character is implementation-defined. Characters may have to be added, altered, or deleted on input and output to conform to differing conventions for representing text in the host environment. Thus, there need not be a one-to-one correspondence between the characters in a stream and those in the external representation. Data read in from a text stream will necessarily compare equal to the data that were earlier written out to that stream only if: the data consist only of printing characters and the control characters horizontal tab and new-line; no new-line character is immediately preceded by space characters; and the last character is a new-line character. Whether space characters that are written out immediately before a new-line character appear when read in is implementation-defined.

That's a lot of things that might or might not happen. Note, in particular, that trailing blanks written to a file might, or might not, appear in the input — according to the standard. That allows the systems that support punched card images or fixed length records to comply with the standard.

Note, too (as pointed out by Giacomo Degli Eposti), that this all means that if you open a file in binary mode that was originally written as a text file, you may very well get a significantly different list of bytes back from the I/O system. You'll see two characters per newline; you might see a Control-Z followed by other characters (possibly null bytes) up to a 'block' boundary that might be a multiple of 256 bytes, etc.



来源:https://stackoverflow.com/questions/19839883/why-when-we-write-n-in-the-file-it-converts-into-r-n-combination

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!