Number of character cells used by string

前端 未结 6 758
离开以前
离开以前 2020-12-02 14:56

I have a program that outputs a textual table using UTF-8 strings, and I need to measure the number of monospaced character cells used by a string so I can align it properly

6条回答
  •  天命终不由人
    2020-12-02 15:16

    I'm shocked that no one mentioned this, so here it goes for the record:

    If you want to align text in a terminal, you need to use the POSIX functions wcwidth and wcswidth. Here's correct program to find the on-screen length of a string.

    #define _XOPEN_SOURCE
    #include 
    #include 
    #include 
    #include 
    
    int measure(char *string) {
        // allocate enough memory to hold the wide string
        size_t needed = mbstowcs(NULL, string, 0) + 1;
        wchar_t *wcstring = malloc(needed * sizeof *wcstring);
        if (!wcstring) return -1;
    
        // change encodings
        if (mbstowcs(wcstring, string, needed) == (size_t)-1) return -2;
    
        // measure width
        int width = wcswidth(wcstring, needed);
    
        free(wcstring);
        return width;
    }
    
    int main(int argc, char **argv) {
        setlocale(LC_ALL, "");
    
        for (int i = 1; i < argc; i++) {
            printf("%s: %d\n", argv[i], measure(argv[i]));
        }
    }
    

    Here's an example of it running:

    $ ./measure hello 莊子 cAb
    hello: 5
    莊子: 4
    cAb: 4
    

    Note how the two characters "莊子" and the three characters "cAb" (note the double-width A) are both 4 columns wide.

    As utf8everywhere.org puts it,

    The size of the string as it appears on the screen is unrelated to the number of code points in the string. One has to communicate with the rendering engine for this. Code points do not occupy one column even in monospace fonts and terminals. POSIX takes this into account.

    Windows does not have any built-in wcwidth function for console output; if you want to support multi-column characters in the Windows console you need to find a portable implementation of wcwidth give up because the Windows console doesn’t support Unicode without crazy hacks.

提交回复
热议问题