What is the range of Unicode Printable Characters?

后端 未结 5 1918
南旧
南旧 2020-11-27 18:00

Can anybody please tell me what is the range of Unicode printable characters? [e.g. Ascii printable character range is \\u0020 - \\u007f]

5条回答
  •  无人及你
    2020-11-27 18:19

    This is an old question, but it is still valid and I think there is more to usefully, but briefly, say on the subject than is covered by existing answers.

    Unicode

    Unicode defines properties for characters.

    One of these properties is "General Category" which has Major classes and subclasses. The Major classes are Letter, Mark, Punctuation, Symbol, Separator, and Other.

    By knowing the properties of your characters, you can decide whether you consider them printable in your particular context.

    You must always remember that terms like "character" and "printable" are often difficult and have interesting edge-cases.


    Programming Language support

    Some programming languages assist with this problem.

    For example, the Go language has a "unicode" package which provides many useful Unicode-related functions including these two:

    func IsGraphic(r rune) bool
    
    IsGraphic reports whether the rune is defined as a Graphic by Unicode. Such  
    characters include letters, marks, numbers, punctuation, symbols, and spaces, 
    from categories L, M, N, P, S, Zs. 
    
    func IsPrint(r rune) bool
    
    IsPrint reports whether the rune is defined as printable by Go. Such  
    characters include letters, marks, numbers, punctuation, symbols, and  
    the ASCII space character, from categories L, M, N, P, S and the ASCII  
    space character. This categorization is the same as IsGraphic except  
    that the only spacing character is ASCII space, U+0020.
    

    Notice that it says "defined as printable by Go" not by "defined as printable by Unicode". It is almost as if there are some depths the wizards at Unicode dare not plumb.


    Printable

    The more you learn about Unicode, the more you realise how unexpectedly diverse and unfathomably weird human writing systems are.

    In particular whether a particular "character" is printable is not always obvious.

    Is a zero-width space printable? When is a hyphenation point printable? Are there characters whose printability depends on their position in a word or on what characters are adjacent to them? Is a combining-character always printable?


    Footnotes

    ASCII printable character range is \u0020 - \u007f

    No it isn't. \u007f is DEL which is not normally considered a printable character. It is, for example, associated with the keyboard key labelled "DEL" whose earliest purpose was to command the deletion of a character from some medium (display, file etc).

    In fact many 8-bit character sets have many non-consecutive ranges which are non-printable. See for example C0 and C1 controls.

提交回复
热议问题