What's the complete range for Chinese characters in Unicode?

后端 未结 6 1006
深忆病人
深忆病人 2020-11-22 09:09

U+4E00..U+9FFF is part of the complete set,but not all

6条回答
  •  时光取名叫无心
    2020-11-22 09:43

    Unicode version 11.0.0

    In Unicode the Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters.

    These ranges often contain non-assigned or reserved code points(such as U+2E9A , U+2EF4 - 2EFF),

    Chinese characters

    bottom  top     reference(also have a look at wiki page)    block name
    4E00    9FEF    http://www.unicode.org/charts/PDF/U4E00.pdf CJK Unified Ideographs
    3400    4DBF    http://www.unicode.org/charts/PDF/U3400.pdf CJK Unified Ideographs Extension A
    20000   2A6DF   http://www.unicode.org/charts/PDF/U20000.pdf    CJK Unified Ideographs Extension B
    2A700   2B73F   http://www.unicode.org/charts/PDF/U2A700.pdf    CJK Unified Ideographs Extension C
    2B740   2B81F   http://www.unicode.org/charts/PDF/U2B740.pdf    CJK Unified Ideographs Extension D
    2B820   2CEAF   http://www.unicode.org/charts/PDF/U2B820.pdf    CJK Unified Ideographs Extension E
    2CEB0   2EBEF   https://www.unicode.org/charts/PDF/U2CEB0.pdf   CJK Unified Ideographs Extension F
    3007    3007    https://zh.wiktionary.org/wiki/%E3%80%87    in block CJK Symbols and Punctuation
    
    • In CJK Unified Ideographs block, I notice many answers use upper bound 9FCC, but U+9FCD(鿍) is indeed a chinese char. And all characters in this block are Chinese characters(also used in Japanese or Korean etc.).
    • Most of characters in CJK Unified Ideograohs Ext (Except Ext F, only 17% in Ext F are chinese characters), are traditional chinese characters, which are rarely used in China.
    • 〇 is the chinese character form of zero and still in use today

    Therefore the range is

    [0x3007,0x3007],[0x3400,0x4DBF],[0x4E00,0x9FEF],[0x20000,0x2EBFF]

    CJK characters but never used in chinese

    They are Common Han used only for compatibility.

    It is almost impossible to see them appear in any chinese book, article , writings etc.

    all characters here has one corresponding glyph-identical chinese characters. Such as 金(U+F90A) and 金(U+91D1), they are identical in Glyph.

     F900    FAFF   https://www.unicode.org/charts/PDF/UF900.pdf  CJK Compatibility Ideographs
    2F800   2FA1F   https://www.unicode.org/charts/PDF/U2F800.pdf CJK Compatibility Ideographs Supplement
    

    CJK related symbols

    2E80    2EFF    http://www.unicode.org/charts/PDF/U2E80.pdf CJK Radicals Supplement
    
    2F00    2FDF    http://www.unicode.org/charts/PDF/U2F00.pdf Kangxi Radicals 
    2FF0    2FFF    https://unicode.org/charts/PDF/U2FF0.pdf    Ideographic Description Character
    3000    303F    https://www.unicode.org/charts/PDF/U3000.pdf    CJK Symbols and Punctuation
    3100    312f    https://unicode.org/charts/PDF/U3100.pdf    Bopomofo
    31A0    31BF    https://unicode.org/charts/PDF/U31A0.pdf    Bopomofo Extended
    31C0    31EF    http://www.unicode.org/charts/PDF/U31C0.pdf CJK Strokes
    3200    32FF    https://unicode.org/charts/PDF/U3200.pdf    Enclosed CJK Letters and Months
    3300    33FF    https://unicode.org/charts/PDF/U3300.pdf    CJK Compatibility
    FE30    FE4F    https://www.unicode.org/charts/PDF/UFE30.pdf    CJK Compatibility Forms
    FF00    FFEF    https://www.unicode.org/charts/PDF/UFF00.pdf    Halfwidth and Fullwidth Forms
    1F200   1F2FF   https://www.unicode.org/charts/PDF/U1F200.pdf   Enclosed Ideographic Supplement
    
    • some blocks such as Hangul Compatibility Jamo are abandoned because of no relation to Chinese.
    • Kangxi Radicals is not Chinese characters, it's graphical component of a Chinese charaters, it are used specially to express radicals, .e.g. ⼻(U+2F3B) and 彳(U+5F73), ⻜(U+2EDC) and 飞 (U+98DE)

    Other common punctuation appears in chinese

    This is a wide range, some punctuation maybe never used, some punctuations such as ……”“ are used so much in chinese.

    0000    007F    https://unicode.org/charts/PDF/U0000.pdf    C0 Controls and Basic Latin 
    2000    206F    https://unicode.org/charts/PDF/U2000.pdf    General Punctuation
    ……
    

    There are also many chinese-related symbols, such as Yijing Hexagram Symbols or Kanbun, but it's off-topic anyway. I write non-chinese-characters in CJK to have a better explaination of what are chinese characters. And ranges above already covers almost all of chars appear in Chinese writing except math and other specialty notation.

    Supplementary

    CJK Symbols and Punctuation

     、。〃〄々〆〇〈〉《》「」『』【】〒〓〔〕〖〗〘〙〚〛〜〝〞〟〠〡〢〣〤〥〦〧〨〩〪〭〮〯〫〬〰〱〲〳〴〵〶〷〸〹〺〻〼〽 〾 〿
    

    Halfwidth and Fullwidth Forms

    !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~⦅⦆。「」、・ヲァィゥェォャュョッーアイウエオカキクケコサシスセソタチツテトナニヌネノハヒフヘホマミムメモヤユヨラリルレロワン゙゚ᄀᄁᆪᄂᆬᆭᄃᄄᄅᆰᆱᆲᆳᆴᆵᄚᄆᄇᄈᄡᄉᄊᄋᄌᄍᄎᄏᄐᄑ하ᅢᅣᅤᅥᅦᅧᅨᅩᅪᅫᅬᅭᅮᅯᅰᅱᅲᅳᅴᅵ¢£¬ ̄¦¥₩│←↑→↓■○
    

    Refer

    1. https://zh.wikipedia.org/wiki/%E6%B1%89%E5%AD%97 (in chinese language, notice the right side bar)
    2. https://zh.wikipedia.org/wiki/%E4%B8%AD%E6%97%A5%E9%9F%93%E7%9B%B8%E5%AE%B9%E8%A1%A8%E6%84%8F%E6%96%87%E5%AD%97 (notice the bottom table)
    3. http://www.unicode.org

提交回复
热议问题