How to change wchar.h to make wchar_t the same type as wint_t?

前端 未结 2 1663
心在旅途
心在旅途 2020-12-07 05:07

wchar_t is defined in wchar.h

Currently, if the developers want to use only wchar_t, they can not do this without getting

相关标签:
2条回答
  • 2020-12-07 05:27

    Note that wint_t was introduced because wchar_t might be a type subject to 'default promotion' rules when passed to printf() et al. This matters, for example, when calling printf():

    wchar_t wc = …;
    printf("%lc", wc);
    

    The value of wc might be converted to wint_t. If you're writing a function like printf() which needs to use the va_arg() macro from <stdarg.h>, then you should use the type wint_t to get the value.

    The standard notes that wint_t might be the same type as wchar_t, but if wchar_t is a (16-bit) short (or unsigned short), wint_t might be (32-bit) int. To a first approximation, wint_t only matters when wchar_t is a 16-bit type. The full rules are, of course, more complex. For example, int could be a 16-bit type — but this is rarely a problem.

    ISO/IEC 9899:2011

    7.29 Extended multibyte and wide character utilities <wchar.h>

    7.29.1 Introduction

    ¶1 The header <wchar.h> defines four macros, and declares four data types, one tag, and many functions.326)

    2 The types declared are wchar_t and size_t (both described in 7.19);

    mbstate_t
    

    which is a complete object type other than an array type that can hold the conversion state information necessary to convert between sequences of multibyte characters and wide characters;

    wint_t
    

    which is an integer type unchanged by default argument promotions that can hold any value corresponding to members of the extended character set, as well as at least one value that does not correspond to any member of the extended character set (see WEOF below);327)

    326) See ‘‘future library directions’’ (7.31.16).
    327) wchar_t and wint_t can be the same integer type.

    §7.19 Common definitions <stddef.h>

    ¶2 … and

    wchar_t
    

    which is an integer type whose range of values can represent distinct codes for all members of the largest extended character set specified among the supported locales; the null character shall have the code value zero. Each member of the basic character set shall have a code value equal to its value when used as the lone character in an integer character constant if an implementation does not define __STDC_MB_MIGHT_NEQ_WC__.

    See Why the argument type of putchar(), fputc(), and putc() is not char for one place where the 'default promotion' rules from the C standard are quoted. There are probably other questions where the information is available too.

    0 讨论(0)
  • 2020-12-07 05:28

    If we need to avoid type conversion warnings when -Wconversion compiler option is used, we need to change wint_t to wchar_t in the prototypes of all library functions, and put '#define WEOF (-1)' to the beginning of wchar.h and wctype.h

    For wchar.h the command is:

    sudo perl -i -pe 'print qq(#define WEOF (-1)\n) if $.==1; next unless /Copy SRC to DEST\./..eof; s/\bwint_t\b/wchar_t/g' /usr/include/wchar.h
    

    For wctype.h the command is:

    sudo perl -i -pe 'print qq(#define WEOF (-1)\n) if $.==1; next unless /Wide-character classification functions/..eof; s/\bwint_t\b/wchar_t/g' /usr/include/wctype.h
    

    Similarly, if you use other header files which use wint_t, simply change wint_t to wchar_t in the prototypes in those header files.

    Explanation follows.

    Some Unix systems define wchar_t as a 16-bit type and thereby follow Unicode very strictly. This definition is perfectly fine with the standard, but it also means that to represent all characters from Unicode and ISO 10646 one has to use UTF-16 surrogate characters, which is in fact a multi-wide-character encoding. But resorting to multi-wide-character encoding contradicts the purpose of the wchar_t type.

    Now, the only encoding to survive for data exchange is UTF-8, and the maximum number of data bits that it can hold is 31:

    1111110x    10xxxxxx    10xxxxxx    10xxxxxx    10xxxxxx    10xxxxxx
    

    So, you see that in practice it is not necessary to have wint_t as a separate type (because 4-byte (i.e., 32 bit) data types are used to store Unicode code points anyway). Maybe it has some applications for "backward compatibility" or something, but in new code it is pointless. Once again, because it defeats the purpose of having wide characters at all (and not being able to handle UTF-8 makes no sense in using wide characters nowadays).

    Notice, that de-facto wint_t is not used anyway. For example, see example in man mbstowcs. There the variable of type wchar_t is passed to iswlower() and other functions from wctype.h, which take wint_t.

    0 讨论(0)
提交回复
热议问题