unicode

Cannot print non-English text to the Console window

拜拜、爱过 提交于 2020-02-05 11:14:50
问题 int main() { wcout << L"Русский текст" << endl; wprintf(L"Русский текст\n"); return 0; } This simple program doesn't print anything to the Console window (not even new lines). VC++ 2010 Console Application project. What is wrong? 回答1: According to links pointed to in this blog you need to change the font of the console and also this line: _setmode(_fileno(stdout), _O_U16TEXT); 来源: https://stackoverflow.com/questions/20729870/cannot-print-non-english-text-to-the-console-window

Asian language PDF display issue in Crystal Reports for VS2008

与世无争的帅哥 提交于 2020-02-05 07:38:04
问题 Here is the context: we use Crystal Reports for Visual Studio 2008 in a ASP.Net application to generate reports which may contain East Asian characters (Chinese, Japanese) in the text entered by the users. The reports are correctly generated on a Windows Server 2003 and incorrectly on Windows Server 2008. When we first had this issue, we found that we needed to: install "East Asian language support" on the server use a Unicode font in CR: Arial Unicode MS install this font on the server With

How to print the utf-16 characters in c

五迷三道 提交于 2020-02-05 03:29:08
问题 int main() { char c = 0x41; printf("char is : %c\n",c); c = 0xe9; printf("char is : %c\n",c); unsigned int d = 0x164e; printf("char is : %c\n",d); return 0; } What I want to print out are: I use Ubuntu 64-bit VMware Workstation on windows and use Octal dump: The hexadecimal value of the three chars from an utf-16 LE txt file. The output: How to print out utf-16 characters correctly? 来源: https://stackoverflow.com/questions/39576310/how-to-print-the-utf-16-characters-in-c

Unicode字符串

戏子无情 提交于 2020-02-04 23:45:49
字符串还有一个编码问题。 因为计算机只能处理数字,如果要处理文本,就必须先把文本转换为数字才能处理。最早的计算机在设计时采用8个比特(bit)作为一个字节(byte),所以,一个字节能表示的最大的整数就是255(二进制11111111=十进制255),0 - 255被用来表示大小写英文字母、数字和一些符号,这个编码表被称为ASCII编码,比如大写字母 A 的编码是65,小写字母 z 的编码是122。 如果要表示中文,显然一个字节是不够的,至少需要两个字节,而且还不能和ASCII编码冲突,所以,中国制定了GB2312编码,用来把中文编进去。 类似的,日文和韩文等其他语言也有这个问题。为了统一所有文字的编码,Unicode应运而生。Unicode把所有语言都统一到一套编码里,这样就不会再有乱码问题了。 Unicode通常用两个字节表示一个字符,原有的英文编码从单字节变成双字节,只需要把高字节全部填为0就可以。 因为Python的诞生比Unicode标准发布的时间还要早,所以最早的Python只支持ASCII编码,普通的字符串'ABC'在Python内部都是ASCII编码的。 Python在后来添加了对Unicode的支持,以Unicode表示的字符串用u'...'表示,其中的转义依然起作用,比如: print(u'中文') 运行结果 中文 转义: u'中文\n日文\n韩文' 多行: u

05 Unicode

帅比萌擦擦* 提交于 2020-02-04 14:30:28
代码点 Unicode标准 的本意很简单:希望给世界上每一种文字系统的每一个字符,都分配一个唯一的整数,这些整数叫做代码点(Code Points)。 代码空间 所有的代码点构成一个代码空间(Code Space) ,根据Unicode定义,总共有 1,114,112 个代码点,编号从0x0到0x10FFFF。换句话说,如果每个代码点都能够代表一个有效字符的话,Unicode标准最多能够编码1,114,112,也就是大概110多万个字符。最新的Unicode标准(7.0)已经给超过11万个字符分配了代码点。 代码平面 Unicode标准把代码点分成了17个代码平面(Code Plane),编号为#0到#16。每个代码平面包含65,536(2^16)个代码点(17*65,536=1,114,112)。其中,Plane#0叫做基本多语言平面(Basic Multilingual Plane,BMP),其余平面叫做补充平面(Supplementary Planes)。Unicode7.0只使用了17个平面中的6个,并且给这6个平面起了名字,如下图所示: 下面是这些平面的名字和用途: BMP(Basic Multilingual Plane) 大部分常用的字符都坐落在这个平面内,比如ASCII字符,汉字等。 SMP(Supplementary Multilingual Plane)

Search for unicode text inside Windows XP

假如想象 提交于 2020-02-04 08:08:33
问题 Is there a way of searching for unicode characters inside a text file under Windows XP? For example suppose I wish to find text documents with the euro symbol. Although the standard XP search allows me to search for the euro symbol it does not produce any matches when I know they should be at least a few. Wingrep has the same issue. Is there any simple software/setting the I have missed? 回答1: The input encoding of the search field (in Windows XP, UTF-16) may not match the encoding of the text

GetSystemDirectory获取系统目录

余生颓废 提交于 2020-02-04 00:55:03
GetSystemDirectory获取系统目录 windows说明 WINBASEAPI UINT WINAPI GetSystemDirectoryA( LPSTR lpBuffer, //缓冲区用于存放取得的系统目录 UINT uSize //缓冲区长度 ); WINBASEAPI UINT WINAPI GetSystemDirectoryW( LPWSTR lpBuffer, UINT uSize ); #ifdef UNICODE #define GetSystemDirectory GetSystemDirectoryW #else #define GetSystemDirectory GetSystemDirectoryA #endif // !UNICODE GetWindowsDirectory 获取windows安装目录 windows定义 WINBASEAPI UINT WINAPI GetWindowsDirectoryA( LPSTR lpBuffer, //缓冲区 UINT uSize //缓冲区长度 ); WINBASEAPI UINT WINAPI GetWindowsDirectoryW( LPWSTR lpBuffer, UINT uSize ); #ifdef UNICODE #define GetWindowsDirectory

Unicode re.sub() doesn't work with \g<0> (group 0)

人盡茶涼 提交于 2020-02-03 13:25:47
问题 Why doesn't the \g<0> work with unicode regex? When I tried to use \g<0> to insert a space before and after the group with normal string regex, it works: >>> punct = """,.:;!@#$%^&*(){}{}|\/?><"'""" >>> rx = re.compile('[%s]' % re.escape(punct)) >>> text = '''"anständig"''' >>> rx.sub(r" \g<0> ",text) ' " anst\xc3\xa4ndig " ' >>> print rx.sub(r" \g<0> ",text) " anständig " but with unicode regex, the space isn't added: >>> punct = u""",–−—’‘‚”“‟„!£"%$'&)(+*-€/.±°´·¸;:=<?>@§#¡•[˚]»_^`≤…\«¿¨{}|

Unicode re.sub() doesn't work with \g<0> (group 0)

萝らか妹 提交于 2020-02-03 13:25:06
问题 Why doesn't the \g<0> work with unicode regex? When I tried to use \g<0> to insert a space before and after the group with normal string regex, it works: >>> punct = """,.:;!@#$%^&*(){}{}|\/?><"'""" >>> rx = re.compile('[%s]' % re.escape(punct)) >>> text = '''"anständig"''' >>> rx.sub(r" \g<0> ",text) ' " anst\xc3\xa4ndig " ' >>> print rx.sub(r" \g<0> ",text) " anständig " but with unicode regex, the space isn't added: >>> punct = u""",–−—’‘‚”“‟„!£"%$'&)(+*-€/.±°´·¸;:=<?>@§#¡•[˚]»_^`≤…\«¿¨{}|

Insert rows with Unicode characters using BCP

一个人想着一个人 提交于 2020-02-03 05:35:25
问题 I'm using BCP to bulk upload data from a CSV file to SQL Azure (because BULK INSERT is not supported). This command runs and uploads the rows: bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -c -U bcpuser@resource -S tcp:resource.database.windows.net But data.csv is UTF8 encoded and contains non-ASCII strings. These get corrupted. I've tried changing the -c option to -w: bcp [resource].dbo.TableName in C:\data.csv -t "," -r "0x0a" -w -U bcpuser@resource -S tcp:resource.database