How is it so that these character constants have negative values?

空扰寡人 提交于 2021-02-07 10:56:18

问题


Reading K&R 1st paragraph page 44 Chapter 2 - The definition of C guarantees that any character in the machine's standard printing set will never be negative, so these characters will always be positive quantities in expressions.

Well enough, but when I run the following code

#include <stdio.h>

int main(void)
{
    printf("%d", '£');

    return 0;
}

I get -93 as the output. I will just cite some of the negative values I get along with the corresponding characters: ÿ = -1, þ = -2, ÷ = -9. I don't understand: if it is true that C guarantees that these values are positive in expressions, how is it that the values are negative?


回答1:


K&R is somewhat informal, but apparently “£” is not in your implementation’s “standard printing set.” The C standard is more formal. It specifies that members of the basic execution character set are nonnegative when stored in char and defines the set to contain A-Z, a-z, 0-9, !, ", #, %, &, ', (, ), *, +, comma, -, period, /, :, ;, <, =, >, ?, [, \, ], ^, _, {, |, }, ~, space, horizontal tab, vertical tab, form feed, alert, backspace, carraige return, new line, and a null character. “£” is not among these, so the C standard does not require that its value be nonnegative.




回答2:


The following affect the value of an int expressed as a character constant containing a single character:

  1. the actual character set and character encoding of the source file;
  2. the assumed (by the compiler) character set and character encoding of the source file;
  3. whether the character in the character constant is encoded as a multibyte character or a single-byte character;
  4. if encoded as a single-byte character, whether the character code falls within the range of the char type or not.

Ideally, you want the assumed character set and encoding of the source to match the actual character set and encoding.

The value of a character constant containing a multibyte sequence (more than one byte) is implementation defined.

If the char type is signed, there may be single-byte characters in the source that cannot be represented as positive char values. Such characters will be represented as negative char values.

In OP's example,

printf("%d", '£');

printed the value -93. Since the '£' character has decimal code 163 in the ISO-8859-1 and ISO-8859-15 character sets, the following seems the most likely deduction:

  1. The source character set is actually ISO-8859-1 or ISO-8859-15 or possibly some variant such as Windows CP-1252.
  2. The source character set assumed by the compiler is ISO-8859-1 or ISO-8859-15 or possibly some variant such as Windows CP-1252.
  3. Due to 1 and 2 above, all characters in the source are encoded as single bytes.
  4. The char type on OP's system is an 8-bit, 2's-complement, signed integer type. (N.B. 163 - 256 = -93.)

According to C11 section 6.4.4.4 paragraph 10:

If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.

Since int can represent all the values of char if char is signed, and the int constant produced by '£' was -93 on OP's system, then as long as '£' really is a single-byte character constant on OP's system it can be deduced that the char value also -93. If '£' is actually a multibyte (more than one byte) character constant on OP's system, then its value is implementation defined and no such deduction can be made.




回答3:


the characters you are concerned about are multi character formats. I.E. 16bits rather than 8 bits I.E. wide characters. so the statement: printf("%d", '£'); will cause the compiler to output a warning message. A much better way to write that statement is: printf("%d", L'£'); Notice the 'L' before the character.

When printf() prints the value using: %d the char is 'promoted` to an integer.

When the upper most bit is 'set', the promotion sees the char as negative so the resulting value will be negative (via sign extension)

However, the upper most bit in this wide character is NOT set, so when the correct data format is used, the printf() outputs: 163



来源:https://stackoverflow.com/questions/57577986/how-is-it-so-that-these-character-constants-have-negative-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!