I have a program that outputs a textual table using UTF-8 strings, and I need to measure the number of monospaced character cells used by a string so I can align it properly
I'm shocked that no one mentioned this, so here it goes for the record:
If you want to align text in a terminal, you need to use the POSIX functions wcwidth and wcswidth. Here's correct program to find the on-screen length of a string.
#define _XOPEN_SOURCE
#include
#include
#include
#include
int measure(char *string) {
// allocate enough memory to hold the wide string
size_t needed = mbstowcs(NULL, string, 0) + 1;
wchar_t *wcstring = malloc(needed * sizeof *wcstring);
if (!wcstring) return -1;
// change encodings
if (mbstowcs(wcstring, string, needed) == (size_t)-1) return -2;
// measure width
int width = wcswidth(wcstring, needed);
free(wcstring);
return width;
}
int main(int argc, char **argv) {
setlocale(LC_ALL, "");
for (int i = 1; i < argc; i++) {
printf("%s: %d\n", argv[i], measure(argv[i]));
}
}
Here's an example of it running:
$ ./measure hello 莊子 cAb
hello: 5
莊子: 4
cAb: 4
Note how the two characters "莊子" and the three characters "cAb" (note the double-width A) are both 4 columns wide.
As utf8everywhere.org puts it,
The size of the string as it appears on the screen is unrelated to the number of code points in the string. One has to communicate with the rendering engine for this. Code points do not occupy one column even in monospace fonts and terminals. POSIX takes this into account.
Windows does not have any built-in wcwidth function for console output; if you want to support multi-column characters in the Windows console you need to find a portable implementation of give up because the Windows console doesn’t support Unicode without crazy hacks.wcwidth