How to get a single Arabic letter in a string with its Unicode transformation value in DELPHI?

前端 未结 3 893
感情败类
感情败类 2020-12-19 04:32

Considering this Arabic word(جبل) made of 3 letters .

-the first letter is جـ, -name is (ǧīm), -its Unicode value is FE9F when its in the beginning, -its basic va

相关标签:
3条回答
  • 2020-12-19 04:47

    I don't think you can do it using string/char related methods. But using pchar, maybe can you access the memory and read the Pword values directly

    EDIT: After discussing with David, I think that you will always get the basic/isolated value of the letter. The fact that begin or end glyph is used, is probably just handled by the display framework of the OS

    0 讨论(0)
  • 2020-12-19 04:48

    Shaping of Arabic characters for presentation in Windows is served by the Uniscribe services (USP10.dll). UniScribe

    You may find the following blog post useful: Roozbeh's Programming Blog

    0 讨论(0)
  • 2020-12-19 05:00

    I'm not sure I understand the question. If you want to know how to write U+FE9F in Delphi source code, in a modern Unicode version of Delphi. Do that simply like so:

    Char($FE9F)
    

    If you want to read individual characters from جبل then do it like this:

    const
      MyWord = 'جبل';
    var
      c: Char;
    ....
    c := MyWord[1];//this is U+062C
    

    Note that the code above is fine for your particular word because each code point can be encoded with a single UTF-16 WideChar character element. If the code point required multiple elements, then it would be best to transform to UTF-32 for code point level processing.


    Now, let's look at the string that you included in the question. I downloaded this question using wget and the file that came down the wires was UTF-8 encoded. I used Notepad++ to convert to UTF16-LE and then picked out the three UTF-16 characters of your string. They are:

    U+062C
    U+0628
    U+0644
    

    You stated:

    The first letter is جـ, name is (ǧīm), its Unicode value is U+FE9F.

    But that is simply incorrect. As can be seen from the above, the actual character you posted was U+062C. So the reason why your attempts to read the first character yield U+062C is that U+062C really is the first character of your string.


    The bottom line is that nothing in your Delphi code is transforming your character. When you do:

    S[1] := Char($FE9F);
    

    the compiler performs a simple two byte copy. There is no context aware transformation that occurs. And likewise when reading S[1].


    Let's look at how these characters are displayed, using this simple code on a VCL forms application that contains a memo control:

    Memo1.Clear;
    Memo1.Lines.Add(StringOfChar(Char($FE9F), 2));
    Memo1.Lines.Add(StringOfChar(Char($062C), 2));
    

    The output looks like this:

    enter image description here

    As you can see, the rendering layer knows what to do with a U+062C character that appears at the beginning of the string.

    0 讨论(0)
提交回复
热议问题