unicode | 易学教程

A List of Characters That Will Break OLAP Cubes

阅读更多关于 A List of Characters That Will Break OLAP Cubes

问题 Today I received a curious error in one of the OLAP cube I was working on. When trying to access it from SSAS or from a external connection in Excel, I received an error similar to what is described below: '', hexadecimal value 0x1A, is an invalid character. Line 1, position 325042770. (System.Xml) Not sure why this special character was displayed as a "->" symbol, but after exporting the error message to text I determined this it was the "SUB" character. Apparently it was a "invalid

MS Office hyperlinks change code page?

阅读更多关于 MS Office hyperlinks change code page?

问题 When you paste the following URL into IE: http://technet.microsoft.com/en-us/sysinternals/bb897434.aspx, the link on the right of the page cleanly says "Download Zoomit (77 KB)". If you paste the link into an Office document (Word, Excel, PowerPoint -- tested using Office 2003), and activate the link from the document, that same text has picked up a couple of A-circumflex symbols. This is because the source HTML contains " " entities (non-breaking space) which get translated to Unicode 00A0.

Paste chinese symbols to oracle DB

阅读更多关于 Paste chinese symbols to oracle DB

问题 I am trying to paste chinese symbols to oracle DB: NLS_NCHAR_CHARACTERSET = 'UTF8' NLS_LANGUAGE = 'RUSSIAN' NLS_TERRITORY = 'RUSSIA' NLS_CURRENCY = '?.' NLS_ISO_CURRENCY = 'RUSSIA' NLS_NUMERIC_CHARACTERS = '','' NLS_CHARACTERSET = 'CL8MSWIN1251' NLS_CALENDAR = 'GREGORIAN' NLS_DATE_FORMAT = 'DD.MM.RR' NLS_DATE_LANGUAGE = 'RUSSIAN' NLS_SORT = 'RUSSIAN' NLS_TIME_FORMAT = 'HH24:MI:SSXFF' NLS_TIMESTAMP_FORMAT = 'DD.MM.RR HH24:MI:SSXFF' NLS_TIME_TZ_FORMAT = 'HH24:MI:SSXFF TZR' NLS_TIMESTAMP_TZ

Char returns the wrong value for 29 unicode characters - Need .NET cast / convert of nchar to char

阅读更多关于 Char returns the wrong value for 29 unicode characters - Need .NET cast / convert of nchar to char

问题 Need a .NET cast / convert of the SQL nchar to char. More specifically cast of the nchar UNICODE to the char ASCII. Where this is complicated is SQL char uses the full byte. Not the pure ASCII of 128. The TSQL function ASCII returns 0-255. Ideally there would be a NormalizationForm of FormByte. It would not be an exact textual value - rather a close logical value or ?. And SQL would use the FormByte to cast from nchar to char. NormalizationForm Encode Decode did not work for me and I tried

Postgresql and unicode table names: Why can I not select the table name from the information schema when it contains unicode characters?

阅读更多关于 Postgresql and unicode table names: Why can I not select the table name from the information schema when it contains unicode characters?

问题 I created a table with a unicode character in the name (to specifically test table names with unicode). It created the table fine, but my method for detecting if the table exists broke! Here is the interaction in question: caribou_test=# select table_name from information_schema.tables where table_schema = 'public'; table_name ------------- ... pinkpink1 (16 rows) caribou_test=# select table_name from information_schema.tables where table_schema = 'public' and table_name = 'pinkƒpink1'; table

Regular expression error disallowed Unicode code point

阅读更多关于 Regular expression error disallowed Unicode code point

问题 I use this regular expression to remove all possible emojis from a string. /(\x{00a9}|\x{00ae}|[\x{2000}-\x{3300}]|\x{d83c}[\x{d000}-\x{dfff}]|\x{d83d}[\x{d000}-\x{dfff}]|\x{d83e}[\x{d000}-\x{dfff}])/u but it throws this exception: preg_replace(): Compilation failed: disallowed Unicode code point (>= 0xd800 && <= 0xdfff) at offset 46 I googled about this problem, but I couldn't find any accurate answer about this problem. I will be appreciated if someone tell me what this error exactly means

using eSpeak tts engine in application

阅读更多关于 using eSpeak tts engine in application

问题 I have this code for text to speech in my application. public void onInit(int status) { // TODO Auto-generated method stub if (status == TextToSpeech.SUCCESS) { //Setting speech language int result = tts.setLanguage(Locale.ENGLISH); //If your device doesn't support language you set above if (result == TextToSpeech.LANG_MISSING_DATA || result == TextToSpeech.LANG_NOT_SUPPORTED) { //Cook simple toast message with message Toast.makeText(this, "Language not supported", Toast.LENGTH_LONG).show();

using eSpeak tts engine in application

阅读更多关于 using eSpeak tts engine in application

Unicode within Maya

阅读更多关于 Unicode within Maya

问题 In my script (written in Sublime Test) I've a comment that reads: # -*- coding: utf-8 -*- import unicodedata # Bööm! Bööm! Shake shake the room! print u"Bööm! Bööm! Shake shake the room!" Which works fine in a command prompt window. However, when dragging and dropping the script into Maya's script editor the same line reads: # BÃ¶Ã¶m! BÃ¶Ã¶m! Shake shake the room! print u"BÃ¶Ã¶m! BÃ¶Ã¶m! Shake shake the room!" How do I make the comment read as intended? 回答1: It's definitely Windows' problem.

浅谈mysql中utf8和utf8mb4区别

阅读更多关于浅谈mysql中utf8和utf8mb4区别

转自：http://ourmysql.com/archives/1402 实践过程中发现有时mysql的字符集会引起故障，所以需要了解下这个知识点。一、简介 MySQL在5.5.3之后增加了这个utf8mb4的编码，mb4就是most bytes 4的意思，专门用来兼容四字节的unicode。好在utf8mb4是utf8的超集，除了将编码改为utf8mb4外不需要做其他转换。当然，为了节省空间，一般情况下使用utf8也就够了。二、内容描述那上面说了既然utf8能够存下大部分中文汉字,那为什么还要使用utf8mb4呢? 原来mysql支持的 utf8 编码最大字符长度为 3 字节，如果遇到 4 字节的宽字符就会插入异常了。三个字节的 UTF-8 最大能编码的 Unicode 字符是 0xffff，也就是 Unicode 中的基本多文种平面(BMP)。也就是说，任何不在基本多文本平面的 Unicode字符，都无法使用 Mysql 的 utf8 字符集存储。包括 Emoji 表情(Emoji 是一种特殊的 Unicode 编码，常见于 ios 和 android 手机上)，和很多不常用的汉字，以及任何新增的 Unicode 字符等等。三、问题根源最初的 UTF-8 格式使用一至六个字节，最大能编码 31 位字符。最新的 UTF-8 规范只使用一到四个字节，最大能编码21位