utf-16 | 易学教程

Is there a drastic difference between UTF-8 and UTF-16

阅读更多关于 Is there a drastic difference between UTF-8 and UTF-16

问题 I call a webservice, that gives me back a response xml that has UTF-8 encoding. I checked that in java using getAllHeaders() method. Now, in my java code, I take that response and then do some processing on it. And later, pass it on to a different service. Now, I googled a bit and found out that by default the encoding in Java for strings is UTF-16. In my response xml, one of the elements had a character É. Now this got screwed in the post processing request that I make to a different service

PHP UTF-16 to ASCII conversion

阅读更多关于 PHP UTF-16 to ASCII conversion

问题 Consider the following string. Its encoded in UTF-16-LE and saved into a PHP variable. I failed to get either mbstring or iconv to replace the ' with single quote. What would be a good way to sanatize it. String : Carl Sagan ' s Cosmic Connection 回答1: Unless I'm misunderstanding the question, ' isn't a UTF-16 issue. That string has had htmlspecialchars() or htmlentities() run on it, and the single quote was converted to the html entity represenation ' . To get it back to normal you need to do

How can I identify different encodings without the use of a BOM?

阅读更多关于 How can I identify different encodings without the use of a BOM?

问题 I have a file watcher that is grabbing content from a growing file encoded with utf-16LE. The first bit of data written to it has the BOM available -- I was using this to identify the encoding against UTF-8 (which MOST of my files coming in are encoded in). I catch the BOM and re-encode to UTF-8 so my parser doesn't freak out. The problem is that since it's a growing file not every bit of data has the BOM in it. Here's my question -- without prepending the BOM bytes to each set of data I have

Why when i insert encoding=“UTF-16BE” in xml the hash is different?

阅读更多关于 Why when i insert encoding=“UTF-16BE” in xml the hash is different?

问题 The following method generate a XmlDocument from a string and then call another method to generate a hash of the XmlDocument created. private void geraXML() { XmlDocument xmlDoc = new XmlDocument(); xmlDoc.PreserveWhitespace = true; string xml = @"<?xml version=""1.0""?>..."; xmlDoc.LoadXml(xml); string caminho = path/+"xmldoc.xml"; string nomeArquivo = "xmldoc.xml"; xmlDoc.Save(caminho); //call method to generate hash geraHASH(caminho, nomeArquivo); } This another method convert the same

Why does code points between U+D800 and U+DBFF generate one-length string in ECMAScript 6?

阅读更多关于 Why does code points between U+D800 and U+DBFF generate one-length string in ECMAScript 6?

问题 I'm getting too confused. Why do code points from U+D800 to U+DBFF encode as a single (2 bytes) String element, when using the ECMAScript 6 native Unicode helpers? I'm not asking how JavaScript/ECMAScript encodes Strings natively, I'm asking about an extra functionality to encode UTF-16 that makes use of UCS-2. var str1 = '\u{D800}'; var str2 = String.fromCodePoint(0xD800); console.log( str1.length, str1.charCodeAt(0), str1.charCodeAt(1) ); console.log( str2.length, str2.charCodeAt(0), str2

VB 6.0 -> Delphi XE2 Conversion

阅读更多关于 VB 6.0 -> Delphi XE2 Conversion

问题 Public Function UTF8FromUTF16(ByRef abytUTF16() As Byte) As Byte() Dim lngByteNum As Long Dim abytUTF8() As Byte Dim lngCharCount As Long On Error GoTo ConversionErr lngCharCount = (UBound(abytUTF16) + 1) \ 2 lngByteNum = WideCharToMultiByteArray(CP_UTF8, 0, abytUTF16(0), _ lngCharCount, 0, 0, 0, 0) If lngByteNum > 0 Then ReDim abytUTF8(lngByteNum - 1) lngByteNum = WideCharToMultiByteArray(CP_UTF8, 0, abytUTF16(0), _ lngCharCount, abytUTF8(0), lngByteNum, 0, 0) UTF8FromUTF16 = abytUTF8 End If

What is the largest code point for 16-bit wchar_t type?

阅读更多关于 What is the largest code point for 16-bit wchar_t type?

问题 It is said here that UTF-16's largest code point is 10FFFF Also it is written on that page that BMP characters require one 16-bit code unit to process or store. But in bit representation 10FFFF is 0001 0000 1111 1111 1111 1111 We see that it occupies more than 15 bits of 16-bit wchar_t (an implementation is allowed to support wide characters with >=0 value only, independently of signedness of wchar_t ) What is the real largest code point for 16-bit wchar_t ? 回答1: It is said here that UTF-16's

How to define a string literal containing non-ASCII characters?

阅读更多关于 How to define a string literal containing non-ASCII characters?

问题 I'm programming in VB.NET using Visual Studio 2008. I need to define a string literal containing the character "÷" equivalent to Chr(247). I understand that internally VS uses UTF-16 encoding, but when the source file is written to disk it contains the single byte value F7 for this character. This source file is processed by another program that uses UTF-8 encoding by default, so it fails to interpret this character correctly, attempting to combine it with the following single-byte character.

Creating UTF-16 newline characters in Python for Windows Notepad

阅读更多关于 Creating UTF-16 newline characters in Python for Windows Notepad

问题 In Python 2.7 running in Ubuntu this code: f = open("testfile.txt", "w") f.write("Line one".encode("utf-16")) f.write(u"\r\n".encode("utf-16")) f.write("Line two".encode("utf-16")) produces the desired newline between the two lines of text when read in Gedit: Line one Line two However, the same code executed in Windows 7 and read in Notepad produces unintelligible characters after "Line one" but no newline is recognized by Notepad. How can I write correct newline characters for UTF-16 in

Call iconv from Ruby 1.8.7 through system to convert a file from utf-16 to utf-8

阅读更多关于 Call iconv from Ruby 1.8.7 through system to convert a file from utf-16 to utf-8

问题 Here's what I got: path_js = 'path/to/a/js/file.js' path_new_js = 'path/where/the/converted/file/should/go.js' puts('iconv -f utf-16le -t utf-8 ' + path_js + ' > ' + path_new_js) system('iconv -f utf-16le -t utf-8 ' + path_js + ' > ' + path_new_js) The output of the puts statement is: iconv -f utf-16le -t utf-8 path/to/1-1-2_E1_MC105.js > compiled/path/to/1-1-2_E1_MC105.js If I copy-paste that exact same line in my terminal the conversion takes place successfully but when it runs inside my