unicode | 易学教程

Cannot print non-English text to the Console window

阅读更多关于 Cannot print non-English text to the Console window

问题 int main() { wcout << L"Русский текст" << endl; wprintf(L"Русский текст\n"); return 0; } This simple program doesn't print anything to the Console window (not even new lines). VC++ 2010 Console Application project. What is wrong? 回答1: According to links pointed to in this blog you need to change the font of the console and also this line: _setmode(_fileno(stdout), _O_U16TEXT); 来源： https://stackoverflow.com/questions/20729870/cannot-print-non-english-text-to-the-console-window

Asian language PDF display issue in Crystal Reports for VS2008

阅读更多关于 Asian language PDF display issue in Crystal Reports for VS2008

问题 Here is the context: we use Crystal Reports for Visual Studio 2008 in a ASP.Net application to generate reports which may contain East Asian characters (Chinese, Japanese) in the text entered by the users. The reports are correctly generated on a Windows Server 2003 and incorrectly on Windows Server 2008. When we first had this issue, we found that we needed to: install "East Asian language support" on the server use a Unicode font in CR: Arial Unicode MS install this font on the server With

How to print the utf-16 characters in c

阅读更多关于 How to print the utf-16 characters in c

问题 int main() { char c = 0x41; printf("char is : %c\n",c); c = 0xe9; printf("char is : %c\n",c); unsigned int d = 0x164e; printf("char is : %c\n",d); return 0; } What I want to print out are: I use Ubuntu 64-bit VMware Workstation on windows and use Octal dump: The hexadecimal value of the three chars from an utf-16 LE txt file. The output: How to print out utf-16 characters correctly? 来源： https://stackoverflow.com/questions/39576310/how-to-print-the-utf-16-characters-in-c

Unicode字符串

阅读更多关于 Unicode字符串

字符串还有一个编码问题。因为计算机只能处理数字，如果要处理文本，就必须先把文本转换为数字才能处理。最早的计算机在设计时采用8个比特（bit）作为一个字节（byte），所以，一个字节能表示的最大的整数就是255（二进制11111111=十进制255），0 - 255被用来表示大小写英文字母、数字和一些符号，这个编码表被称为ASCII编码，比如大写字母 A 的编码是65，小写字母 z 的编码是122。如果要表示中文，显然一个字节是不够的，至少需要两个字节，而且还不能和ASCII编码冲突，所以，中国制定了GB2312编码，用来把中文编进去。类似的，日文和韩文等其他语言也有这个问题。为了统一所有文字的编码，Unicode应运而生。Unicode把所有语言都统一到一套编码里，这样就不会再有乱码问题了。 Unicode通常用两个字节表示一个字符，原有的英文编码从单字节变成双字节，只需要把高字节全部填为0就可以。因为Python的诞生比Unicode标准发布的时间还要早，所以最早的Python只支持ASCII编码，普通的字符串'ABC'在Python内部都是ASCII编码的。 Python在后来添加了对Unicode的支持，以Unicode表示的字符串用u'...'表示，其中的转义依然起作用，比如： print(u'中文') 运行结果中文转义： u'中文\n日文\n韩文' 多行： u

05 Unicode

阅读更多关于 05 Unicode

代码点 Unicode标准的本意很简单：希望给世界上每一种文字系统的每一个字符，都分配一个唯一的整数，这些整数叫做代码点（Code Points）。代码空间所有的代码点构成一个代码空间（Code Space），根据Unicode定义，总共有 1,114,112 个代码点，编号从0x0到0x10FFFF。换句话说，如果每个代码点都能够代表一个有效字符的话，Unicode标准最多能够编码1,114,112，也就是大概110多万个字符。最新的Unicode标准（7.0）已经给超过11万个字符分配了代码点。代码平面 Unicode标准把代码点分成了17个代码平面（Code Plane），编号为#0到#16。每个代码平面包含65,536（2^16）个代码点（17*65,536=1,114,112）。其中，Plane#0叫做基本多语言平面（Basic Multilingual Plane，BMP），其余平面叫做补充平面（Supplementary Planes）。Unicode7.0只使用了17个平面中的6个，并且给这6个平面起了名字，如下图所示：下面是这些平面的名字和用途： BMP（Basic Multilingual Plane）大部分常用的字符都坐落在这个平面内，比如ASCII字符，汉字等。 SMP（Supplementary Multilingual Plane）

Search for unicode text inside Windows XP

阅读更多关于 Search for unicode text inside Windows XP

问题 Is there a way of searching for unicode characters inside a text file under Windows XP? For example suppose I wish to find text documents with the euro symbol. Although the standard XP search allows me to search for the euro symbol it does not produce any matches when I know they should be at least a few. Wingrep has the same issue. Is there any simple software/setting the I have missed? 回答1: The input encoding of the search field (in Windows XP, UTF-16) may not match the encoding of the text

GetSystemDirectory获取系统目录

阅读更多关于 GetSystemDirectory获取系统目录

GetSystemDirectory获取系统目录 windows说明 WINBASEAPI UINT WINAPI GetSystemDirectoryA( LPSTR lpBuffer, //缓冲区用于存放取得的系统目录 UINT uSize //缓冲区长度 ); WINBASEAPI UINT WINAPI GetSystemDirectoryW( LPWSTR lpBuffer, UINT uSize ); #ifdef UNICODE #define GetSystemDirectory GetSystemDirectoryW #else #define GetSystemDirectory GetSystemDirectoryA #endif // !UNICODE GetWindowsDirectory 获取windows安装目录 windows定义 WINBASEAPI UINT WINAPI GetWindowsDirectoryA( LPSTR lpBuffer, //缓冲区 UINT uSize //缓冲区长度 ); WINBASEAPI UINT WINAPI GetWindowsDirectoryW( LPWSTR lpBuffer, UINT uSize ); #ifdef UNICODE #define GetWindowsDirectory

Unicode re.sub() doesn't work with \g<0> (group 0)

阅读更多关于 Unicode re.sub() doesn't work with \g (group 0)

问题 Why doesn't the \g<0> work with unicode regex? When I tried to use \g<0> to insert a space before and after the group with normal string regex, it works: >>> punct = """,.:;!@#$%^&*(){}{}|\/?><"'""" >>> rx = re.compile('[%s]' % re.escape(punct)) >>> text = '''"anständig"''' >>> rx.sub(r" \g<0> ",text) ' " anst\xc3\xa4ndig " ' >>> print rx.sub(r" \g<0> ",text) " anständig " but with unicode regex, the space isn't added: >>> punct = u""",–−—’‘‚”“‟„!£"%$'&)(+*-€/.±°´·¸;:=<?>@§#¡•[˚]»_^`≤…\«¿¨{}|

Unicode re.sub() doesn't work with \g<0> (group 0)

阅读更多关于 Unicode re.sub() doesn't work with \g (group 0)

Insert rows with Unicode characters using BCP

阅读更多关于 Insert rows with Unicode characters using BCP

问题 I'm using BCP to bulk upload data from a CSV file to SQL Azure (because BULK INSERT is not supported). This command runs and uploads the rows: bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -c -U bcpuser@resource -S tcp:resource.database.windows.net But data.csv is UTF8 encoded and contains non-ASCII strings. These get corrupted. I've tried changing the -c option to -w: bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -w -U bcpuser@resource -S tcp:resource.database