In Unicode, why are there two representations for the Arabic digits?

后端 未结 3 1670
傲寒
傲寒 2020-11-30 06:07

I was reading the specification of Unicode @ Wikipedia (Arabic Unicode) and I see that each of the Arabic digits has 2 Unicode code points. For example 1 is defined as U+066

相关标签:
3条回答
  • 2020-11-30 06:24

    According to the code charts, U+0660 .. U+0669 are ARABIC-INDIC DIGIT values 0 through 9, while U+06F0 .. U+06F9 are EXTENDED ARABIC-INDIC DIGIT values 0 through 9.

    In the Unicode 3.0 book (5.2 is the current version, but these things don't change much once set), the U+066n series of glyphs are marked 'Arabic-Indic digits' and the U+06Fn series of glyphs are marked 'Eastern Arabic-Indic digits (Persian and Urdu)'. It also notes:

    • U+06F4 - 'different glyphs in Persian and Urdu'
    • U+06F5 - 'Persian and Urdu share glyph different from Arabic'
    • U+06F6 - 'Persian glyph different from Arabic'
    • U+06F7 - 'Urdu glyph different from Arabic'

    For comparison:

    • U+066n: ٠١٢٣٤٥٦٧٨٩
    • U+06Fn: ۰۱۲۳۴۵۶۷۸۹

    Or, enlarged by making the information into a title:

    U+066n: ٠١٢٣٤٥٦٧٨٩

    U+06Fn: ۰۱۲۳۴۵۶۷۸۹

    Or:

         U+066n    U+06Fn
    0      ٠         ۰
    1      ١         ۱
    2      ٢         ۲
    3      ٣         ۳
    4      ٤         ۴
    5      ٥         ۵
    6      ٦         ۶
    7      ٧         ۷
    8      ٨         ۸
    9      ٩         ۹
    

    (Whether you can see any of those, and how clearly they are differentiated may depend on your browser and the fonts installed on your machine as much as anything else. I can see the difference on 4 and 6 clearly; 5 looks much the same in both.)

    Based on this information, if you are working with Arabic from the Middle East, use the U+066n series of digits; if you are working with Persian or Urdu, use the U+06Fn series of digits. As a Unicode application, you should accept either set of codes as valid digits (but you might look askance at a sequence that mixed the two sets of digits - or you might just leave well alone).

    0 讨论(0)
  • 2020-11-30 06:39

    In general you should not hard-code such info in your application.

    • On Windows you can use GetLocaleInfo with LOCALE_SNATIVEDIGITS.
    • On Mac CFNumberFormatterCopyProperty with kCFNumberFormatterZeroSymbol.
    • Or use something like ICU.

    There are Arabic countries that don't use the Arabic-Indic digits by default. So there is no direct mapping saying Arabic -> Arabic-Indic digits.

    And the user might have changed the defaults in the Control Panel anyway.

    0 讨论(0)
  • 2020-11-30 06:48

    Which code do you prefer for representing the number 4, U+0664 or U+06F4?

    (٤ or ۴ )?

    To be consistent, let this choice guide which codes you use for 1, 2, and the other duplicate codes.

    0 讨论(0)
提交回复
热议问题