问题
in our langauge we use arabic characters in writing with some differences, icu's ushape.c ( arabic shaper) only works with main arabic characters and dosn't shape my language specific characters ( i.e 0x6D5 etc) i'v changed ushape.c to work with my language and it worked well except for on character, that is 0x649, in arabic they have only 2 shapes, in my langauge we have 4 shapes for it.
i'v changed line 183
1 + 256 * 0x7F,/*0x0649*/
to
1+2+8 + 256 * 0x98 /*0x649*/
and changed line 121
static const UChar yehHamzaToYeh[] =
{
/* isolated*/ 0xFEEF,
/* final */ 0xFEF0
};
to
static const UChar yehHamzaToYeh[] =
{
/* isolated */0xFEEF,
0xFBE8, // my language specific
0xFBE9,// my language specific
/* final */ 0xFEF0
};
from ushape.c
now it can produce 3 shapes with no problem ( the beginning,isolated and final), but middle shape is displayed as a square ( missing character ) .
i tried replacing "* 0x98" with other numbers, but this best i can get.
what should i do ?
回答1:
Uighur? I discussed with a couple of people about Uighur rendering, not this particular issue but in general.
When you said you get a square, what Unicode character do you get?
What you really should do is to file a bug with ICU and discuss it there. This is a feature request, not a usage question.
My rusty recollection is that for Uighur it makes different use of shaping, and you will want to basically have a different mode on the shaper.
回答2:
ICU indeed seems to have problems for shaping with some languages, e.g. Urdu.
Your specific character 649 however is probably not the characters that you are looking for.
U+649 is alef maksura which looks identical to Farsi Yeh U+6cc which is shaped properly by ICU.
They do have different presentation forms: Alef maksura only has isolated and final form: U+feef U+fef0 Farsi yeh has all four forms: U+fbfc U+fbfd U+fbfe U+fbff
来源:https://stackoverflow.com/questions/3855940/icu4c-ushape-c-missing-character-in-shaping