utf-16 | 易学教程

Is the [0xff, 0xfe] prefix required on utf-16 encoded strings?

阅读更多关于 Is the [0xff, 0xfe] prefix required on utf-16 encoded strings?

Rewritten question! I am working with a vendor's device that requires "unicode encoding" of strings, where each character is represented in two bytes. My strings will always be ASCII based, so I thought this would be the way to translate my string into the vendor's string: >>> b1 = 'abc'.encode('utf-16') But examining the result, I see that there's a leading [0xff, 0xfe] on the bytearray: >>> [hex(b) for b in b1] ['0xff', '0xfe', '0x61', '0x0', '0x62', '0x0', '0x63', '0x0'] Since the vendor's device is not expecting the [0xff, 0xfe], I can strip it off... >>> b2 = 'abc'.encode('utf-16')[2:] >>

Create UTF-16 string from char*

阅读更多关于 Create UTF-16 string from char*

问题 So I have standard C string: char* name = "Jakub"; And I want to convert it to UTF-16. I figured out, that UTF-16 will be twice as long - one character takes two chars. So I create another string: char name_utf_16[10]; //"Jakub" is 5 characters Now, I believe with ASCII characters I will only use lower bytes, so for all of them it will be like 74 00 for J and so on. With that belief, I can make such code: void charToUtf16(char* input, char* output, int length) { /*Todo: how to check if output

Create UTF-16 string from char*

阅读更多关于 Create UTF-16 string from char*

So I have standard C string: char* name = "Jakub"; And I want to convert it to UTF-16. I figured out, that UTF-16 will be twice as long - one character takes two chars. So I create another string: char name_utf_16[10]; //"Jakub" is 5 characters Now, I believe with ASCII characters I will only use lower bytes, so for all of them it will be like 74 00 for J and so on. With that belief, I can make such code: void charToUtf16(char* input, char* output, int length) { /*Todo: how to check if output is long enough?*/ for(int i=0; i<length; i+=2) //Step over 2 bytes { //Lets use little-endian -

Output UTF-16? A little stuck

阅读更多关于 Output UTF-16? A little stuck

问题 I have some UTF-16 encoded characters in their surrogate pair form. I want to output those surrogate pairs as characters on the screen. Does anyone know how this is possible? 回答1: iconv('UTF-16', 'UTF-8', yourString) 回答2: Your question is a little unclear. If you have ASCII text with embedded UTF-16 escape sequences, you can convert everything to UTF-8 in this way: function unescape_utf16($string) { /* go for possible surrogate pairs first */ $string = preg_replace_callback( '/\\\\u(D[89ab][0

Ubuntu 解决TXT文本乱码问题

阅读更多关于 Ubuntu 解决TXT文本乱码问题

只要依次在终端输入这两行指令即可： gsettings set org.gnome.gedit.preferences.encodings auto-detected "['GB18030', 'GB2312', 'GBK', 'UTF-8', 'BIG5', 'CURRENT', 'UTF-16']" gsettings set org.gnome.gedit.preferences.encodings shown-in-menu "['GB18030', 'GB2312', 'GBK', 'UTF-8', 'BIG5', 'CURRENT', 'UTF-16']" 来源： https://www.cnblogs.com/zsbzsb/p/11722266.html

linux cat 文件编码

阅读更多关于 linux cat 文件编码

test.log是utf-16的编码 cat test.log会报错但是我们可以cat的时候指定编码格式 iconv -f 文件编码 -t 终端编码 input.log iconv -f utf-16 -t utf-8 test.log 来源： https://www.cnblogs.com/ruiy/p/11717780.html

How to Convert UTF-16 to UTF-32 and Print the Resulting wchar_t in C?

阅读更多关于 How to Convert UTF-16 to UTF-32 and Print the Resulting wchar_t in C?

i'm trying to print out a string of UTF-16 characters. i posted this question a while back and the advice given was to convert to UTF-32 using iconv and print it as a string of wchar_t. i've done some research, and managed to code the following: // *c is the pointer to the characters (UTF-16) i'm trying to print // sz is the size in bytes of the input i'm trying to print iconv_t icv; char in_buf[sz]; char* in; size_t in_sz; char out_buf[sz * 2]; char* out; size_t out_sz; icv = iconv_open("UTF-32", "UTF-16"); memcpy(in_buf, c, sz); in = in_buf; in_sz = sz; out = out_buf; out_sz = sz * 2; size_t

Java implicit conversion of int to byte

阅读更多关于 Java implicit conversion of int to byte

I am about to start working on something the requires reading bytes and creating strings. The bytes being read represent UTF-16 strings. So just to test things out I wanted to convert a simple byte array in UTF-16 encoding to a string. The first 2 bytes in the array must represent the endianness and so must be either 0xff 0xfe or 0xfe 0xff. So I tried creating my byte array as follows: byte[] bytes = new byte[] {0xff, 0xfe, 0x52, 0x00, 0x6F, 0x00}; But I got an error because 0xFF and 0xFE are too big to fit into a byte (because bytes are signed in Java). More precisely the error was that the

Java implicit conversion of int to byte

阅读更多关于 Java implicit conversion of int to byte

问题 I am about to start working on something the requires reading bytes and creating strings. The bytes being read represent UTF-16 strings. So just to test things out I wanted to convert a simple byte array in UTF-16 encoding to a string. The first 2 bytes in the array must represent the endianness and so must be either 0xff 0xfe or 0xfe 0xff. So I tried creating my byte array as follows: byte[] bytes = new byte[] {0xff, 0xfe, 0x52, 0x00, 0x6F, 0x00}; But I got an error because 0xFF and 0xFE are

Is UTF-16 compatible with UTF-8?

阅读更多关于 Is UTF-16 compatible with UTF-8?

I asked Google the question above and was sent to Difference between UTF-8 and UTF-16? which unfortunately doesn't answer the question. From my understanding UTF-8 should be a subset of UTF-16 meaning: if my code uses UTF-16 and I hand in a UTF-8 encoded string everything should always be fine. The other way around (expecting UTF-8 and getting UTF-16) may cause problems. Is that correct? EDIT: To clarify why the linked SO question doesn't answer my question: My problem arose when trying to process a JSON string using WebClient.DownloadString , because the WebClient used the wrong encoding. The