UTF-16 string terminator

痴心易碎 提交于 2019-12-23 07:04:52

问题


What is the string terminator sequence for a UTF-16 string?

EDIT:

Let me rephrase the question in an attempt to clarify. How's does the call to wcslen() work?


回答1:


Unicode does not define string terminators. Your environment or language does. For instance, C strings use 0x0 as a string terminator, as well as in .NET strings where a separate value in the String class is used to store the length of the string.

To answer your second question, wcslen looks for a terminating L'\0' character. Which as I read it, is any length of 0x00 bytes, depending on the compiler, but will likely be the two-byte sequence 0x00 0x00 if you're using UTF-16 (encoding U+0000, 'NUL')




回答2:


7.24.4.6.1 The wcslen function (from the Standard)

...

   [#3]   The  wcslen  function  returns  the  number  of  wide
   characters that precede the terminating null wide character.

And the null wide character is L'\0'




回答3:


There isn't any. String terminators are not part of an encoding.

For example if you had the string ab it would be encoded in UTF-16 with the following sequence of bytes: 61 00 62 00. And if you had 大家 you would get 27-59-B6-5B. So as you can see no predetermined terminator sequence.



来源:https://stackoverflow.com/questions/5923948/utf-16-string-terminator

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!