How to get a single Arabic letter in a string with its Unicode transformation value in DELPHI?

前端未结

关注

 3  893

感情败类

Considering this Arabic word(جبل) made of 3 letters .

-the first letter is جـ, -name is (ǧīm), -its Unicode value is FE9F when its in the beginning, -its basic va

相关标签:

3条回答

既然无缘

2020-12-19 04:47

I don't think you can do it using string/char related methods. But using pchar, maybe can you access the memory and read the Pword values directly

EDIT: After discussing with David, I think that you will always get the basic/isolated value of the letter. The fact that begin or end glyph is used, is probably just handled by the display framework of the OS

0 讨论(0)
发布评论:

提交评论
- 加载中...
生来不讨喜

2020-12-19 04:48

Shaping of Arabic characters for presentation in Windows is served by the Uniscribe services (USP10.dll). UniScribe

You may find the following blog post useful: Roozbeh's Programming Blog

0 讨论(0)
发布评论:

提交评论
- 加载中...
别跟我提以往

2020-12-19 05:00
I'm not sure I understand the question. If you want to know how to write U+FE9F in Delphi source code, in a modern Unicode version of Delphi. Do that simply like so:
```
Char($FE9F)
```
If you want to read individual characters from جبل then do it like this:
```
const
  MyWord = 'جبل';
var
  c: Char;
....
c := MyWord[1];//this is U+062C
```
Note that the code above is fine for your particular word because each code point can be encoded with a single UTF-16 WideChar character element. If the code point required multiple elements, then it would be best to transform to UTF-32 for code point level processing.

Now, let's look at the string that you included in the question. I downloaded this question using wget and the file that came down the wires was UTF-8 encoded. I used Notepad++ to convert to UTF16-LE and then picked out the three UTF-16 characters of your string. They are:
```
U+062C
U+0628
U+0644
```
You stated:

The first letter is جـ, name is (ǧīm), its Unicode value is U+FE9F.

But that is simply incorrect. As can be seen from the above, the actual character you posted was U+062C. So the reason why your attempts to read the first character yield U+062C is that U+062C really is the first character of your string.

The bottom line is that nothing in your Delphi code is transforming your character. When you do:
```
S[1] := Char($FE9F);
```
the compiler performs a simple two byte copy. There is no context aware transformation that occurs. And likewise when reading S[1].

Let's look at how these characters are displayed, using this simple code on a VCL forms application that contains a memo control:
```
Memo1.Clear;
Memo1.Lines.Add(StringOfChar(Char($FE9F), 2));
Memo1.Lines.Add(StringOfChar(Char($062C), 2));
```
The output looks like this:

As you can see, the rendering layer knows what to do with a U+062C character that appears at the beginning of the string.
0 讨论(0)
发布评论:

提交评论
- 加载中...