Why does getchar() recognize EOF only in the beginning of a line?

问题

This example is from the K&R book

#include<stdio.h>


main()
{
    long nc;

    nc = 0;
    while(getchar() != EOF)
        ++nc;
    printf("%ld\n", nc);
}

Could you explain me why it works that way. Thanks.

^Z^Z doesn't work either (unless it's in the beginning of a line)

回答1:

Traditional UNIX interpretation of tty EOF character is to make blocking read return after reading whatever is buffered inside a cooked tty line buffer. In the start of a new line, it means read returning 0 (reading zero bytes), and incidentally, 0-sized read is how the end of file condition on ordinary files is detected.

That's why the first EOF in the middle of a line just forces the beginning of the line to be read, not making C runtime library detect an end of file. Two EOF characters in a row produce 0-sized read, because the second one forces an empty buffer to be read by an application.

$ cat
foo[press ^D]foo <=== after ^D, input printed back before EOL, despite cooked mode. No EOF detected
foo[press ^D]foo[press ^D] <=== after first ^D, input printed back, and on second ^D, cat detects EOF

$ cat
Some first line<CR> <=== input
Some first line <=== the line is read and printed
[press ^D] <=== at line start, ^D forces 0-sized read to happen, cat detects EOF

I assume that your C runtime library imitates the semantics described above (there is no special handling of ^Z at the level of kernel32 calls, let alone system calls, on Windows). That's why it would probably detect EOF after ^Z^Z even in the middle of an input line.

回答2:

The program will read EOF only at the actual end of the input. If your terminal/OS/whatever only permit files to end at the start of a line then that's where you'll find them. I believe this is a throw-back to old-fashioned terminals where data was only transmitted a line at a time (for all I know it goes back to punched card readers).

Try reading your data from a file that you've preprepared with an EOF mid-line. You may even find that some editors make this difficult! Your program should work fine with that as input.

回答3:

EOF indicates "end of file". A newline (which is what happens when you press enter) isn't the end of a file, it's the end of a line, so a newline doesn't terminate this loop.

Depending on the operating system, EOF character will only work if it's the first character on a line, i.e. the first character after an Enter. Since console input is often line-oriented, the system may also not recognize the EOF character until after you've followed it up with an Enter.

回答4:

I happened to have the same question as you. When I want to end the function getchar(), I have to enter 2 EOF or enter a <ENTER> plus a EOF.

And here's an easier answer I searched about this question:

If there is characters entering in the terminal, EOF will play the role as stopping this entering, which will arouse a new turn of entering; while, if there is no entering happening, or in another word, when the getchar() is waiting for a new enter(such as you've just finished entering or a EOF), the EOF you are about to enter now equals "end of file", which will lead the program stop executing the function getchar().

PS: the question happens when you are using getchar(). I think this answer is easier to understand, but maybe not for you since it is translated from Chinese...

来源：https://stackoverflow.com/questions/14436596/why-does-getchar-recognize-eof-only-in-the-beginning-of-a-line

标签

eof