utf-16 | 易学教程

Convert &#55357; &#56911; to Emoji in HTML using PHP

阅读更多关于 Convert &#55357; &#56911; to Emoji in HTML using PHP

问题 We have a bunch of surrogate pair (or 2-byte utf8?) characters such as &#55357;&#56911; which is the prayer hands emojis stored as UTF8 as 2 characters. When rendered in a browser this string renders as two ?? example: I need to convert those to the hands emjoi using php but I simply cannot find a combination of iconv, utf8_decode, html_entity_decode etc to pull it off. This site converts the &#55357;&#56911; properly: http://www.convertstring.com/EncodeDecode/HtmlDecode Paste in there the

UnicodeDecodeError when performing os.walk

阅读更多关于 UnicodeDecodeError when performing os.walk

问题 I am getting the error: 'ascii' codec can't decode byte 0x8b in position 14: ordinal not in range(128) when trying to do os.walk. The error occurs because some of the files in a directory have the 0x8b (non-utf8) character in them. The files come from a Windows system (hence the utf-16 filenames), but I have copied the files over to a Linux system and am using python 2.7 (running in Linux) to traverse the directories. I have tried passing a unicode start path to os.walk, and all the files &

Using JNA to get/set application identifier

阅读更多关于 Using JNA to get/set application identifier

问题 Following up on my previous question concerning the Windows 7 taskbar, I would like to diagnose why Windows isn't acknowledging that my application is independent of javaw.exe . I presently have the following JNA code to obtain the AppUserModelID: public class AppIdTest { public static void main(String[] args) { NativeLibrary lib; try { lib = NativeLibrary.getInstance("shell32"); } catch (Error e) { System.err.println("Could not load Shell32 library."); return; } Object[] functionArgs = new

Writing utf16 to file in binary mode

阅读更多关于 Writing utf16 to file in binary mode

I'm trying to write a wstring to file with ofstream in binary mode, but I think I'm doing something wrong. This is what I've tried: ofstream outFile("test.txt", std::ios::out | std::ios::binary); wstring hello = L"hello"; outFile.write((char *) hello.c_str(), hello.length() * sizeof(wchar_t)); outFile.close(); Opening test.txt in for example Firefox with encoding set to UTF16 it will show as: h�e�l�l�o� Could anyone tell me why this happens? EDIT: Opening the file in a hex editor I get: FF FE 68 00 00 00 65 00 00 00 6C 00 00 00 6C 00 00 00 6F 00 00 00 Looks like I get two extra bytes in

What's the best way to export UTF8 data into Excel?

阅读更多关于 What's the best way to export UTF8 data into Excel?

问题 So we have this web app where we support UTF8 data. Hooray UTF8. And we can export the user-supplied data into CSV no problem - it's still in UTF8 at that point. The problem is when you open a typical UTF8 CSV up in Excel, it reads it as ANSII encoded text, and accordingly tries to read two-byte chars like ø and ü as two separate characters and you end up with fail. So I've done a bit of digging (the Intervals folks have a interesting post about it here), and there are some limited if

JavaScript strings outside of the BMP

阅读更多关于 JavaScript strings outside of the BMP

BMP being Basic Multilingual Plane According to JavaScript: the Good Parts : JavaScript was built at a time when Unicode was a 16-bit character set, so all characters in JavaScript are 16 bits wide. This leads me to believe that JavaScript uses UCS-2 (not UTF-16!) and can only handle characters up to U+FFFF. Further investigation confirms this: > String.fromCharCode(0x20001); The fromCharCode method seems to only use the lowest 16 bits when returning the Unicode character. Trying to get U+20001 (CJK unified ideograph 20001) instead returns U+0001. Question: is it at all possible to handle post

VBA Output to file using UTF-16

阅读更多关于 VBA Output to file using UTF-16

问题 I have a very complex problem that is difficult to explain properly. There is LOTS of discussion about this across the internet, but nothing definitive. Any help, or better explanation than mine, is greatly appreciated. Essentially, I'm just trying to write an XML file using UTF-16 with VBA. If I do this: sXML = "<?xml version='1.0' encoding='utf-8'?>" sXML = sXML & rest_of_xml_document Print #iFile, sXML then I get a file that is valid XML. However, if I change the "encoding=" to "utf-16", I

How does Java store UTF-16 characters in its 16-bit char type?

阅读更多关于 How does Java store UTF-16 characters in its 16-bit char type?

问题 According to the Java SE 7 Specification, Java uses the Unicode UTF-16 standard to represent characters. When imagining a String as a simple array of 16-bit variables each containing one character, life is simple. Unfortunately, there are code points for which 16 bits simply aren\'t enough (I believe it was 16/17th of all Unicode characters). So in a String , this poses no direct problem, because when wanting to store one of these ~1.048.576 characters using an additional two bytes , simply

Byte and char conversion in Java

阅读更多关于 Byte and char conversion in Java

问题 If I convert a character to byte and then back to char , that character mysteriously disappears and becomes something else. How is this possible? This is the code: char a = \'È\'; // line 1 byte b = (byte)a; // line 2 char c = (char)b; // line 3 System.out.println((char)c + \" \" + (int)c); Until line 2 everything is fine: In line 1 I could print \"a\" in the console and it would show \"È\". In line 2 I could print \"b\" in the console and it would show -56, that is 200 because byte is signed

grepping binary files and UTF16

阅读更多关于 grepping binary files and UTF16

问题 Standard grep / pcregrep etc. can conveniently be used with binary files for ASCII or UTF8 data - is there a simple way to make them try UTF16 too (preferably simultaneously, but instead will do)? Data I\'m trying to get is all ASCII anyway (references in libraries etc.), it just doesn\'t get found as sometimes there\'s 00 between any two characters, and sometimes there isn\'t. I don\'t see any way to get it done semantically, but these 00s should do the trick, except I cannot easily use them