Why does an fread loop require an extra Ctrl+D to signal EOF with glibc?

感情迁移 提交于 2021-02-18 10:13:37

问题


Normally, to indicate EOF to a program attached to standard input on a Linux terminal, I need to press Ctrl+D once if I just pressed Enter, or twice otherwise. I noticed that the patch command is different, though. With it, I need to press Ctrl+D twice if I just pressed Enter, or three times otherwise. (Doing cat | patch instead doesn't have this oddity. Also, If I press Ctrl+D before typing any real input at all, it doesn't have this oddity.) Digging into patch's source code, I traced this back to the way it loops on fread. Here's a minimal program that does the same thing:

#include <stdio.h>

int main(void) {
    char buf[4096];
    size_t charsread;
    while((charsread = fread(buf, 1, sizeof(buf), stdin)) != 0) {
        printf("Read %zu bytes. EOF: %d. Error: %d.\n", charsread, feof(stdin), ferror(stdin));
    }
    printf("Read zero bytes. EOF: %d. Error: %d. Exiting.\n", feof(stdin), ferror(stdin));
    return 0;
}

When compiling and running the above program exactly as-is, here's a timeline of events:

  1. My program calls fread.
  2. fread calls the read system call.
  3. I type "asdf".
  4. I press Enter.
  5. The read system call returns 5.
  6. fread calls the read system call again.
  7. I press Ctrl+D.
  8. The read system call returns 0.
  9. fread returns 5.
  10. My program prints Read 5 bytes. EOF: 1. Error: 0.
  11. My program calls fread again.
  12. fread calls the read system call.
  13. I press Ctrl+D again.
  14. The read system call returns 0.
  15. fread returns 0.
  16. My program prints Read zero bytes. EOF: 1. Error: 0. Exiting.

Why does this means of reading stdin have this behavior, unlike the way that every other program seems to read it? Is this a bug in patch? How should this kind of loop be written to avoid this behavior?

UPDATE: This seems to be related to libc. I originally experienced it on glibc 2.23-0ubuntu3 from Ubuntu 16.04. @Barmar noted in the comments that it doesn't happen on macOS. After hearing this, I tried compiling the same program against musl 1.1.9-1, also from Ubuntu 16.04, and it didn't have this problem. On musl, the sequence of events has steps 12 through 14 removed, which is why it doesn't have the problem, but is otherwise the same (except for the irrelevant detail of readv in place of read).

Now, the question becomes: is glibc wrong in its behavior, or is patch wrong in assuming that its libc won't have this behavior?


回答1:


I've managed to confirm that this is due to an unambiguous bug in glibc versions prior to 2.28 (commit 2cc7bad). Relevant quotes from the C standard:

The byte input/output functions — those functions described in this subclause that perform input/output: [...], fread

The byte input functions read characters from the stream as if by successive calls to the fgetc function.

If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream.

(emphasis on "or" mine)

The following program demonstrates the bug with fgetc:

#include <stdio.h>

int main(void) {
    while(fgetc(stdin) != EOF) {
        puts("Read and discarded a character from stdin");
    }
    puts("fgetc(stdin) returned EOF");
    if(!feof(stdin)) {
        /* Included only for completeness. Doesn't occur in my testing. */
        puts("Standard violation! After fgetc returned EOF, the end-of-file indicator wasn't set");
        return 1;
    }
    if(fgetc(stdin) != EOF) {
        /* This happens with glibc in my testing. */
        puts("Standard violation! When fgetc was called with the end-of-file indicator set, it didn't return EOF");
        return 1;
    }
    /* This happens with musl in my testing. */
    puts("No standard violation detected");
    return 0;
}

To demonstrate the bug:

  1. Compile the program and execute it
  2. Press Ctrl+D
  3. Press Enter

The exact bug is that if the end-of-file stream indicator is set, but the stream is not at end-of-file, glibc's fgetc will return the next character from the stream, rather than EOF as the standard requires.

Since fread is defined in terms of fgetc, this is the cause of what I originally saw. It's previously been reported as glibc bug #1190 and has been fixed since commit 2cc7bad in February 2018, which landed in glibc 2.28 in August 2018.



来源:https://stackoverflow.com/questions/52674057/why-does-an-fread-loop-require-an-extra-ctrld-to-signal-eof-with-glibc

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!