utf-16 | 易学教程

Having trouble with UTF-8 storing in NVarChar in SQL Server 2008

阅读更多关于 Having trouble with UTF-8 storing in NVarChar in SQL Server 2008

问题 I'm pulling data using System.Net.WebClient from a web site, and when the data comes back everything parses and looks good except letters with accents. For example, when it returns an é , SQL Server 2008 saves it as Ã© . Just need to figure out how to convert these UTF-8 characters into something SQL Server can read. I'm storing it in an NVARCHAR(MAX) datatype. I'm using Linq-to-SQL to insert into the database if you were curious. Any thoughts on what I could do to convert it to the proper

Open mails in outlook from java using the protocol “mapi://”

阅读更多关于 Open mails in outlook from java using the protocol “mapi://”

问题 I developp a Java application using Windows Desktop Search from which I can retrieve some information about files on my computer such as urls (System.ItemUrl). An example of such url is file://c:/users/ausername/documents/aninterestingfile.txt for "normal" files. This field give also urls of mail items indexed from Outlook or Thunderbird. Thunderbird's items (only available using vista and seven) are also files (.wdseml). But outlook's items urls start with "mapi://" like : mapi://{S-1-5-21

javascript and string manipulation w/ utf-16 surrogate pairs

阅读更多关于 javascript and string manipulation w/ utf-16 surrogate pairs

I'm working on a twitter app and just stumbled into the world of utf-8(16). It seems the majority of javascript string functions are as blind to surrogate pairs as I was. I've got to recode some stuff to make it wide character aware. I've got this function to parse strings into arrays while preserving the surrogate pairs. Then I'll recode several functions to deal with the arrays rather than strings. function sortSurrogates(str){ var cp = []; // array to hold code points while(str.length){ // loop till we've done the whole string if(/[\uD800-\uDFFF]/.test(str.substr(0,1))){ // test the first

Why does .net uses the UTF16 encoding for string , but uses utf8 as default for saving files?

阅读更多关于 Why does .net uses the UTF16 encoding for string , but uses utf8 as default for saving files?

From here Essentially, string uses the UTF-16 character encoding form But when saving vs StreamWriter : This constructor creates a StreamWriter with UTF-8 encoding without a Byte-Order Mark (BOM), I've seen this sample (broken link removed): And it looks like utf8 is smaller for some strings while utf-16 is smaller in some other strings. So Why .net uses utf16 as default encoding for string while utf8 for saving file ? Thank you. p.s. Ive already read the famous article If you're happy ignoring surrogate pairs (or equivalently, the possibility of your app needing characters outside the Basic

utf-16 file seeking in python. how?

阅读更多关于 utf-16 file seeking in python. how?

问题 For some reason i can not seek my utf16 file. It produces 'UnicodeException: UTF-16 stream does not start with BOM'. My code: f = codecs.open(ai_file, 'r', 'utf-16') seek = self.ai_map[self._cbClass.Text] #seek is valid int f.seek(seek) while True: ln = f.readline().strip() I tried random stuff like first reading something from stream, didnt help. I checked offset that is seeked to using hex editor - string starts at character, not null byte (i guess its good sign, right?) So how to seek utf

Why Java char uses UTF-16?

阅读更多关于 Why Java char uses UTF-16?

Recently I read lots of thing about unicode code points and how they evolved over time and sure I read http://www.joelonsoftware.com/articles/Unicode.html this also. But something I couldn't find the real reason why Java uses UTF-16 for a char. For example If I had the string which contains 1024 letter of ASCII scoped charachter string. It means 1024 * 2 bytes which equals to 2KB string memory it will consume in anyway. So if Java base char would be UTF-8 it would be just 1KB of data. Even if the string has any charachter which needs to 2bytes for example 10 charachter of "字" naturally it will

Convert &#55357; &#56911; to Emoji in HTML using PHP

阅读更多关于 Convert &#55357; &#56911; to Emoji in HTML using PHP

We have a bunch of surrogate pair (or 2-byte utf8?) characters such as &#55357;&#56911; which is the prayer hands emojis stored as UTF8 as 2 characters. When rendered in a browser this string renders as two ?? example: I need to convert those to the hands emjoi using php but I simply cannot find a combination of iconv, utf8_decode, html_entity_decode etc to pull it off. This site converts the &#55357;&#56911; properly: http://www.convertstring.com/EncodeDecode/HtmlDecode Paste in there the following string Please join me in this prayer. &#55357;&#56911;❤️ You will notice the surragate pair ( &

Inno Setup Pascal Script - Reading UTF-16 file

阅读更多关于 Inno Setup Pascal Script - Reading UTF-16 file

I have an .inf file exported from Resource Hacker. The file is in UTF-16 LE encoding. EXTRALARGELEGENDSII_INI TEXTFILE "Data.bin" LARGEFONTSLEGENDSII_INI TEXTFILE "Data_2.bin" NORMALLEGENDSII_INI TEXTFILE "Data_3.bin" THEMES_INI TEXTFILE "Data_4.bin" When I load it using the LoadStringFromFile function : procedure LoadResources; var RESOURCE_INFO: AnsiString; begin LoadStringFromFile(ExpandConstant('{tmp}\SKINRESOURCE - INFO.inf'), RESOURCE_INFO); Log(String(RESOURCE_INFO)); end; I am getting this in the Debug Output : E Please tell me how to fix this issue. Thanks in advance. It seems the

Java charAt used with characters that have two code units

阅读更多关于 Java charAt used with characters that have two code units

From Core Java , vol. 1, 9th ed., p. 69: The character ℤ requires two code units in the UTF-16 encoding. Calling String sentence = "ℤ is the set of integers"; // for clarity; not in book char ch = sentence.charAt(1) doesn't return a space but the second code unit of ℤ. But it seems that sentence.charAt(1) does return a space. For example, the if statement in the following code evaluates to true . String sentence = "ℤ is the set of integers"; if (sentence.charAt(1) == ' ') System.out.println("sentence.charAt(1) returns a space"); Why? I'm using JDK SE 1.7.0_09 on Ubuntu 12.10, if it's relevant.

git gui - can it be made to display UTF16?

阅读更多关于 git gui - can it be made to display UTF16?

问题 Is there any way to make git gui display and show diffs for UTF16 files somehow? I found some information, but this is mostly referring to the command line rather than the gui. 回答1: I have been working on a much better solution with help from the msysGit people, and have come up with this clean/smudge filter. The filter uses the Gnu file and iconv commands to determine the type of the file, and convert it to and from msysGit's internal UTF-8 format. This type of Clean/Smudge Filter gives you