*Might* an unsigned char be equal to EOF? [duplicate]

我们两清 提交于 2021-02-07 05:44:13

问题


When using fgetc to read the next character of a stream, you usually check that the end-of-file was not attained by

if ((c = fgetc (stream)) != EOF)

where c is of int type. Then, either the end-of-file has been attained and the condition will fail, or c shall be an unsigned char converted to int, which is expected to be different from EOF —for EOF is ensured to be negative. Fine... apparently.

But there is a small problem... Usually the char type has no more than 8 bits, while int must have at least 16 bits, so every unsigned char will be representable as an int. Nevertheless, in the case char would have 16 or 32 bits (I know, this is never the case in practice...), there is no reason why one could not have sizeof(int) == 1, so that it would be (theoretically!) possible that fgetc (stream) returns EOF (or another negative value) but that end-of-file has not been attained...

Am I mistaken? Is it something in the C standard that prevents fgetc to return EOF if end-of-file has not been attained? (If yes, I could not find it!). Or is the if ((c = fgetc (stream)) != EOF) syntax not fully portable?...

EDIT: Indeed, this was a duplicate of Question #3860943. I did not find that question at first search. Thank for your help! :-)


回答1:


You asked:

Is it something in the C standard that prevents fgetc to return EOF if end-of-file has not been attained?

On the contrary, the standard explicitly allows EOF to be returned when an error occurs.

If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.

In the footnotes, I see:

An end-of-file and a read error can be distinguished by use of the feof and ferror functions.

You also asked:

Or is the if ((c = fgetc (stream)) != EOF) syntax not fully portable?

On the theoretical platform where CHAR_BIT is more than 8 and sizeof(int) == 1, that won't be a valid way to check that end-of-file has been reached. For that, you'll have to resort to feof and ferror.

c = fgetc (stream);
if ( !feof(stream) && !ferror(stream) )
{
  // Got valid input in c.
}



回答2:


I think you need to rely on stream error.

ch = fgetc(stream);
if (ferror(stream) && (ch == EOF)) /* end of file */;

From the standard

If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.


Edit for better version

ch = fgetc(stream);
if (ch == EOF) {
    if (ferror(stream)) /* error reading */;
    else if (feof(stream)) /* end of file */;
    else /* read valid character with value equal to EOF */;
}



回答3:


If you are reading a stream that is standard ASCII only, there's no risk of receiving the char equivalent to EOF before the real end-of-file, because valid ASCII char codes go up to 127 only. But it could happen when reading a binary file. The byte would need to be 255(unsigned) to correspond to a -1 signed char, and nothing prevents it from appearing in a binary file.

But about your specific question (if there's something in the standard), not exactly... but notice that fgetc promotes the character as an unsigned char, so it won't ever be negative in this case anyway. The only risk would be if you had explicitly or implicitly cast down the return value to signed char (for instance, if your c variable were signed char).

NOTE: as @Ulfalizer mentioned in the comments, there's one rare case in which you may need to worry: if sizeof(int)==1, and you're reading a file that contains non-ascii characters, then you may get a -1 return value that is not the real EOF. Notice that environments in which this happens are quite rare (to my knowledge, compilers for low-end 8-bit microcontrollers, like the 8051). In such a case, the safe option would be to test feof() as @pmg suggested.




回答4:


I agree with your reading.

C Standard says (C11, 7.21.7.1 The fgetc function p3):

If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the endof-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.

There is nothing in the Standard (assuming UCHAR_MAX > INT_MAX) that disallows fgetc in a hosted implementation to return a value equal to EOF that is neither an end-of-file nor an error condition indicator.



来源:https://stackoverflow.com/questions/29975874/might-an-unsigned-char-be-equal-to-eof

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!