How does ncurses output non-ascii characters?

问题

I'd like to know how ncurses (a c library) manages to put characters like ├, despite them not (to the best of my knowledge) being part of ASCII.

I would have assumed it was just drawing them pixel by pixel, but you can copy/paste them out of the terminal (in MacOS).

回答1:

ncurses puts characters such as ├ on the screen by assuming that your locale environment variables (LC_ALL and/or LC_CTYPE) match the terminal on which you are displaying. The environment variables indicate the encoding (e.g., UTF-8). There are other encodings and terminals which support those encodings, but generally speaking you'll mostly see UTF-8. If the environment and terminal cooperate, things "just work":

at startup, ncurses checks for the locale which a program has initialized, via setlocale, and determines if that uses UTF-8. It uses that information later.
when a program adds character strings, e.g., using addstr, ncurses uses the character-type information (set as a side-effect of calling setlocale), and uses standard C library functions for combining sequences of bytes which make up a multi-byte character, and converting those into wide characters. It stores those wide characters internally, and
when writing to the terminal, ncurses reverses the process, converting from wide characters to use the encoding assumed to be supported by the terminal (assuming that your locale environment matches the terminal).

However —

The character indicated ├ happens to be a special case. That is one of the graphic characters used for line-drawing, which predate Unicode and UTF-8. curses has names for these graphic characters, making it simple to refer to them, e.g., ACS_LTEE (the ├ is a left-tee):

Before UTF-8 came along to complicate things, developers came up with a scheme using a table of these graphic characters by adapting the escape sequences used for the VT100 (late 1970s) and the AT&T 4410 and 5410 terminals (apparently the early 1980s since the latter were in use by 1984) for drawing their graphic characters.
AT&T SystemV curses provided support for these graphic characters from the mid-1980s. BSD curses never did that...
Unicode (roughly 1990 and later) provided most of the same glyphs using a different encoding. There are a few omissions (the most noticeable are the scan lines above/below the one used for horizontal lines), but once UTF-8 got into use in the early 2000s, it was logical to extend ncurses to use these characters.
ncurses looks at the locale settings, but prefers using the terminal description for these graphic characters except for cases where that is known to not work — and will assume that the terminal can display the Unicode equivalents for these characters if the terminal is assumed to use UTF-8. It uses a table for this purpose (SystemV curses and its successor X/Open Curses didn't do any of this — NetBSD curses adapted the table from ncurses sometime after 2010).

回答2:

There is more than one version of ncurses, for more than one platform, and if you really want to know, check the source. However, none of them would draw a character pixel-by-pixel; that isn’t something a library running inside a terminal emulator does.

Modern versions of the C standard library, POSIX and ncurses all support writing wide characters to the console and conversion between wide and multibyte strings. Today, wide characters are normally UTF-16 or UTF-32 and multibyte strings are normally UTF-8. You can see the documentation for <wchar.h> and ncursesw for more information.

Note that C11 does have support for UTF-8 literals, through the u8 prefix.

A program that’s concerned about portability with systems where the local multibyte encoding is something other than UTF-8 can use another library such as the C++ standard library or ICU to convert between UTF-8 and wide-character strings, then display those with curses.

You might need to #define _XOPEN_SOURCE 700, or the appropriate value for the version of the standard you are targeting, and with some versions of the libraries, also #define _XOPEN_SOURCE_EXTENDED 1, to get your system libraries to let you use functions such as addwstr().

However, many programs might simply send strings of char encoded in UTF-8 to the console and assume it can handle them. I don’t recommend this approach, but it works on most Linux systems in 2017.

来源：https://stackoverflow.com/questions/43671649/how-does-ncurses-output-non-ascii-characters

标签

ncurses