Why is © (the copyright symbol) replaced with (C) when using wprintf?

两盒软妹~` 提交于 2020-04-10 09:09:47

问题


When I try to print the copyright symbol © with printf or write, it works just fine:

#include <stdio.h>

int main(void)
{
    printf("©\n");
}

#include <unistd.h>

int main(void)
{
    write(1, "©\n", 3);
}

Output:

©

But when I try to print it with wprintf, I get (C):

#include <stdio.h>
#include <wchar.h>

int main(void)
{
    wprintf(L"©\n");
}

Output:

(C)

It's fixed when I add a call to setlocale, though:

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main(void)
{
    setlocale(LC_ALL, "");
    wprintf(L"©\n");
}

Output:

©

Why is the original behavior present and why is it fixed when I call setlocale? Additionally, where does this conversion take place? And how can I make the behavior after setlocale the default?

compilation command:

gcc test.c

locale:

LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

echo $LC_CTYPE:


uname -a:

Linux penguin 4.19.79-07511-ge32b3719f26b #1 SMP PREEMPT Mon Nov 18 17:41:41 PST 2019 x86_64 GNU/Linux

file test.c (same on all of the examples):

test.c: C source, UTF-8 Unicode text

gcc --version:

gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

/lib/x86_64-linux-gnu/libc-2.24.so (glibc version):

GNU C Library (Debian GLIBC 2.24-11+deb9u4) stable release version 2.24, by Roland McGrath et al.
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 6.3.0 20170516.
Available extensions:
        crypt add-on version 2.1 by Michael Glad and others
        GNU Libidn by Simon Josefsson
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.

cat /etc/debian_version:

9.12

回答1:


The locale of the calling processes is not automatically inherited by the new process.

When the program first starts up, it is in the C locale. The man page for setlocale(3) says the following:

On startup of the main program, the portable "C" locale is selected as default. A program may be made portable to all locales by calling:

setlocale(LC_ALL, "");

...

The locale "C" or "POSIX" is a portable locale; its LC_CTYPE part corresponds to the 7-bit ASCII character set.

So any multibyte / non-ASCII character is converted into one or more ASCII characters as the output shows.

The locale can be set as follows:

setlocale(LC_ALL, "");

The LC_ALL flag specifies changing all locale-related variables. An empty string for the locale means to set the locale according to the relevant environment variables. Once this is done, you should see the characters for your shell's locale.

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main()
{
    char *before = setlocale(LC_ALL, NULL);
    setlocale(LC_ALL, "");
    char *after = setlocale(LC_ALL, NULL);

    wprintf(L"before locale: %s\n", before);
    wprintf(L"after locale: %s\n", after);
    wprintf(L"©\n");
    wprintf(L"\u00A9\n");
    return 0;
}

Output:

before locale: C
after locale: en_US.utf8
©
©


来源:https://stackoverflow.com/questions/60458832/why-is-the-copyright-symbol-replaced-with-c-when-using-wprintf

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!