character-encoding | 易学教程

Defining 4-byte UTF-16 character in a string

阅读更多关于 Defining 4-byte UTF-16 character in a string

问题 I have read a question about UTF-8, UTF-16 and UCS-2 and almost all answers give the statement that UCS-2 is obsolete and C# uses UTF-16. However, all my attempts to create the 4-byte character U+1D11E in C# failed, so I actually think C# uses the UCS-2 subset of UTF-16 only. There are my tries: string s = "\u1D11E"; // gives the 2 character string "ᴑE", because \u1D11 is ᴑ string s = (char) 0x1D11E; // won't compile because of an overflow string s = Encoding.Unicode.GetString(new byte[]

JS charCodeAt equivalent in PHP (with full unicode and emoji compatibility)

阅读更多关于 JS charCodeAt equivalent in PHP (with full unicode and emoji compatibility)

问题 I have a simple code in JS that I can't replicate in PHP if it comes to special characters. This is the JS code (see JSFiddle for output): var str = "t🙏🏿😘🎚↙️🕗🇨🇬芳"; //char "t" and special characters, emojis, etc.. document.write("Length is: "+str.length); // Length is: 19 for(var i=0; i<str.length; i++) { document.write("<br> charCodeAt(" + i + "): " + str.charCodeAt(i)); } The first problem is that PHP strlen() and mb_strlen() already gives different results from JS (strlen: 39, mb_strlen: 11

PHP json_decode return error code 4

阅读更多关于 PHP json_decode return error code 4

问题 I had previously asked the same question. I would like to decode the json from: http://pad.skyozora.com/data/pets.json. Below is the code I used previously: <?php $html=file_get_contents("http://pad.skyozora.com/data/pets.json"); var_dump(json_decode($html,true)); //return null var_dump(json_last_error()); // return 4 ?> From the last answer I know there is UTF8 DOM in the json return. I tried the answer from a similar question: json_decode returns NULL after webservice call, but all of the

Why did this str_ireplace() work on a non ASCII string?

阅读更多关于 Why did this str_ireplace() work on a non ASCII string?

问题 Note: What I think I know is probably wrong, so please kindly fix my knowledge :) I just answered a question about UTF-8 and PHP. I suggested using str_ireplace('Волгоград', '', $a) . I didn't expect this to work, but it did. I always thought PHP treated one byte as one character, hence why you need to use mb_* functions to get accurate results when using characters outside of ASCII range. I assumed the Russian characters would take > 1 byte each. I thought str_replace() would work because

How to detect which character set encoding in Java?

阅读更多关于 How to detect which character set encoding in Java?

问题 Does anybody know if there is a simple way to detect character set encoding in Java? It seems to me that some programs have the ability to detect which character set a given piece of data uses, or at least make an aproximation. I suppose the underlying mechanism would have to decode the data in each character set and pick whichever one has the least undefined characters followed by which character set is more common to break a tie. Any ideas? 回答1: For finding whether data is in any unicode

How to detect which character set encoding in Java?

阅读更多关于 How to detect which character set encoding in Java?

When sending XML to JMS should I use TextMessage or BytesMessage

阅读更多关于 When sending XML to JMS should I use TextMessage or BytesMessage

问题 I have found some quite conflicting information on the web and I think that each different JMS provider may also alter the answer too. I'm trying to understand when sending XML into a JMS system (e.g. ActiveMQ) whether I should use a BytesMessage : I can guarantee that the XML is serialized correctly and the preamble will match the actual encoding. Furthermore I can be sure that the client will be able to get the raw representation correctly. TextMessage : There are APIs in many of the queue

How to handle undecodable filenames in Python?

阅读更多关于 How to handle undecodable filenames in Python?

问题 I'd really like to have my Python application deal exclusively with Unicode strings internally. This has been going well for me lately, but I've run into an issue with handling paths. The POSIX API for filesystems isn't Unicode, so it's possible (and actually somewhat common) for files to have "undecodable" names: filenames that aren't encoded in the filesystem's stated encoding. In Python, this manifests as a mixture of unicode and str objects being returned from os.listdir() . >>> os

Difference between Encoding.UTF8.GetBytes and UTF8Encoding.Default.GetBytes

阅读更多关于 Difference between Encoding.UTF8.GetBytes and UTF8Encoding.Default.GetBytes

问题 Can someone please explain me what is the difference bet. Encoding.UTF8.GetBytes and UTF8Encoding.Default.GetBytes? Actually I am trying to convert a XML string into a stream object and what happens now is whenever I use this line: MemoryStream stream = new MemoryStream(UTF8Encoding.Default.GetBytes(xml)); it gives me an error "System.Xml.XmlException: Invalid character in the given encoding" but when I use this line it works fine: **MemoryStream stream = new MemoryStream(Encoding.UTF8

Difference between Encoding.UTF8.GetBytes and UTF8Encoding.Default.GetBytes

阅读更多关于 Difference between Encoding.UTF8.GetBytes and UTF8Encoding.Default.GetBytes