Why Delphi IBX TWideMemoField converts byte order in UTF8 string and how to avoid it?

问题

I am using Delphi 2009 with IBX on Firebird 3 database (I have no choice to choose other technologies, I have to adapt to the situation). I have the following defintions:

Firebird BLOB field is defined as:

BLOB SUB_TYPE 0 SEGMENT SIZE 80

TWideMemoField is defined as:

object MainQryNOTES: TWideMemoField
  FieldName = 'NOTES'
  Origin = 'INVOICES.NOTES'
  ProviderFlags = [pfInUpdate]
  BlobType = ftWideMemo
end

The test string is "Цель по инфляции, %" and in it can be read from the BLOB field in the IBExpert software as:

26 04 35 04 3B 04 4C 04 20 00 3F 04 3E 04 20 00
38 04 3D 04 44 04 3B 04 4F 04 46 04 38 04 38 04
2C 00 20 00 25 00

The strange thing is that the Delphi inverts byte order, e.g. cyrillic character Ц has HEX UTF8 representation as 04 26, but it is stored in database as 26 04 and the similar situation is exactly with the other characters as well (one can check this with the help of tables https://www.w3schools.com/charsets/ref_utf_basic_latin.asp and https://www.w3schools.com/charsets/ref_utf_cyrillic.asp). In my case I have only 2-byte charactes, but I guess that the similar situation will be with 3 and 4 byte UTF8 characters as well.

So - how can I configure TWideMemoField to ask not to convert byte order of UTF8 strings?

回答1:

Your text is not encoded as UTF8, it is encoded as UTF16. The character Ц is U+0426. And by convention the 16 bit code unit is being stored in little endian byte order, $26 $04.

In other words, everything is behaving as expected and as designed and I can see no need for you to try to fix anything because nothing is broken.

来源：https://stackoverflow.com/questions/52144358/why-delphi-ibx-twidememofield-converts-byte-order-in-utf8-string-and-how-to-avoi

标签

Delphi

unicode

utf-8

firebird

firebird-3.0