问题
From The C Programming Language:
int c;
while ((c = getchar()) != EOF)
putchar(c);
"... The solution is that getchar
returns a distinctive value when there is no more input, a value that cannot be confused with any real character. This value is called EOF
, for "end of file." We must declare c
to be a type big enough to hold any value that getchar
returns. We can't use char
since c
must be big enough to hold EOF
in addition to any possible char
."
I checked in stdio.h
and printed the value of EOF on my system, and it's set to -1
. On my system, chars
are signed, although I understand that this is system dependent. So, EOF
can fit in a char
for my system. I rewrote the small routine above by defining c
to be a char
and the program works as intended. There's also a character in the ASCII character table here that appears to have a blank character corresponding to 255 which appears to act like EOF
.
So, why does it appear that ASCII has a character (255) designated for EOF? This seems to contradict what is said in the The C Programming Language book.
回答1:
So, why does it appear that ASCII has a character (255) designated for EOF?
It hasn't. More precisely, that's not the EOF "character".
The trick is, getchar()
will always return non-negative values if it has something to read. It will only return -1
(that's what EOF
appears to be defined on your implementation) if it encounters end-of-file.
The fact that char
is:
- 8 bits wide,
- signed and
- uses a 2's complement representation,
is just a quirk of your implementation (although overwhelmingly common nowadays). Thus, if you are using a char
to store the return value of getchar()
, then reading the input may terminate prematurely: the character with code 255 will be mistaken for -1 a. k. a. EOF
, which is an error. This is just what happened to you. It didn't work -- conversely, your second approach was completely broken.
回答2:
When getchar()
reads the byte 255, it returns 255. When getchar()
finds that there is no more input, it returns -1.
If you store the result in a char
, you cannot distinguish the two. But when you store them in an int
, you can. (This statement is independent of the signedness of char
).
Only if you know that the result was valid can you convert it to char
and get the usual C-style character type.
回答3:
According to manual on getchar() it always returns int value:
#include <stdio.h>
...
int getchar(void);
...
RETURN VALUE
fgetc(), getc() and getchar() return the character read as
an unsigned char cast to an int or EOF on end of file or error.
Thus using char instead of int will cause truncation (int -1 (0xffffffff) becomes char -1 (0xff)) and may cause errors.
回答4:
To understand how this works imagine what was the guy writing getchar thinking. You need to read a file. Start by creating a routine - for example:
unsigned char get_me_a_byte(file)... // 0..255
now you want to read all bytes from a file:
unsigned char c;
while( c = get_me_a_byte(file) ) // while( (c = get_me_a_byte(file)) != 0 )
{
... do sth
}
The problem is that it will stop when z zero is encountered but you want to stop once everything is red. Now you are getting smarter - you know files can be thought of as sequence of bytes. What if your get_me_a_byte could return 16 or 32 bit type? Then you could use some value that byte cannot hold as end of file marker.
bingo
Since decision is yours you may have:
int get_me_a byte_U(file) ... // returning bytes as 0..255
int get_me_a byte_S(file) ... // returning bytes as -128..127
Now you can do:
int c;
while( (c = get_me_a_byte_U(file) != UUU ) ....
where UUU could be anything from 256 to MAXINT on your platform
Similarly:
int c;
while( (c = get_me_a_byte_S(file) != SSS ) ....
where SSS could be anything from MININT..-129 and 128..MAXINT
Now if you chose first method there is a question: What should value of UUU (your EOF) be?
(-1) is good for EOF because regardless of what is the bit width of variable you may assign it to it will remain (-1). By 'remain -1' I mean it will always be all ones pattern.
char c = -1; // c = 11111111b / 0xFF / 255 (assuming your char is signed 8bit)
short s = -1; // s = 1111111111111111b / 0xFFFF / 65535
int i = -1; // s = 11111111111111111111111111111111b / 0xFFFFFFFF / 4294967295
Now it should be obvious.
回答5:
There is no contradiction.
- EOF is NOT a character, just a condition found when reading a file.
- ASCII 255 sometimes corresponds to a non-breaking space a.k.a HTML entity
As noted in the comments, ASCII encodes only 128 characters, so beyond that you'll find different encodings.
From the table that you linked to I would just say:
255 is a non printable character
来源:https://stackoverflow.com/questions/19715850/eof-symbolic-constant