utf-8 | 易学教程

How to best deal with Windows' 16-bit wchar_t ugliness?

阅读更多关于 How to best deal with Windows' 16-bit wchar_t ugliness?

问题 I'm writing a wrapper layer to be used with mingw which provides the application with a virtual UTF-8 environment. Functions which deal with filenames are wrappers which convert from UTF-8 and call the corresponding "_w" functions, and so on. The big problem I've run into is that Windows' wchar_t is 16-bit. For filesystem operations, it's not a big deal. I can just convert back and forth between UTF-8 and UTF-16, and everything will work. But the standard C multibyte/wide character conversion

Encoding binary data within XML: Are there better alternatives than base64?

阅读更多关于 Encoding binary data within XML: Are there better alternatives than base64?

问题 I want to encode and decode binary data within an XML file (with Python, but whatever). I have to face the fact that an XML tag content has illegal characters. The only allowed ones are described in XML specs: Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] Which means that the unallowed are: 29 Unicode control characters are illegal (0x00 - 0x20) ie ( 000xxxxx ) except 0x09, 0x0A, 0x0D Any Unicode character representation above 2 bytes (UTF-16+) is illegal (U

glob() can't find file names with multibyte characters on Windows?

阅读更多关于 glob() can't find file names with multibyte characters on Windows?

问题 I'm writing a file manager and need to scan directories and deal with renaming files that may have multibyte characters. I'm working on it locally on Windows/Apache PHP 5.3.8, with the following file names in a directory: filename.jpg имяфайла.jpg file件name.jpg פילענאַמע.jpg 文件名.jpg Testing on a live UNIX server woked fine. Testing locally on Windows using glob('./path/*') returns only the first one, filename.jpg . Using scandir() , the correct number of files is returned at least, but I get

Which file encodings are supported for Python 3 source files?

阅读更多关于 Which file encodings are supported for Python 3 source files?

问题 Before you go telling me to read PEP 0263, keep reading... I can't find any documentation that details which file encodings are supported for Python 3 source files . I've found hundreds (thousands?) of questions, answers, posts, emails, etc. about how to declare - at the top of your source file - the encoding of that source file, but none of them answer my question. Bear with me and imagine doing (or actually try) the following: Open Notepad (I'm using regular old Notepad on Windows 7, but I

Has anyone been able to write out UTF-8 characters using python's xlwt?

阅读更多关于 Has anyone been able to write out UTF-8 characters using python's xlwt?

问题 I'm trying to write data to an excel file that includes Japanese characters. I'm using codec.open() to get the data, and that seems to work fine, but I run into this error when I try to write the data: UnicodeEncodeError: 'ascii' codec can't encode characters in position 16-17: ordinal not in range(128) I don't understand why the program would be insisting on using ascii here. When I created a new workbook object, I did so using wb = xlwt.Workbook(encoding='utf-8') and both the program file

Can not send special characters (UTF-8) from JSP to Servlet: question marks displayed

阅读更多关于 Can not send special characters (UTF-8) from JSP to Servlet: question marks displayed

问题 I have a problem sending special characters like cyrillic or umlauts from a jsp to a servlet. I would greatly appreciate your help here. Here is what I have done: Defined the utf-8 charset in the jsp: <%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="utf-8"%> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> ... <div class=

Can not send special characters (UTF-8) from JSP to Servlet: question marks displayed

阅读更多关于 Can not send special characters (UTF-8) from JSP to Servlet: question marks displayed

Can not send special characters (UTF-8) from JSP to Servlet: question marks displayed

阅读更多关于 Can not send special characters (UTF-8) from JSP to Servlet: question marks displayed

c++ how to write/read ofstream in unicode / utf8

阅读更多关于 c++ how to write/read ofstream in unicode / utf8

问题 I have UTF-8 text file , that I'm reading using simple : ifstream in("test.txt"); Now I'd like to create a new file that will be UTF-8 encoding or Unicode. How can I do this with ofstream or other? This creates ansi Encoding. ofstream out(fileName.c_str(), ios::out | ios::app | ios::binary); 回答1: Ok, about the portable variant. It is easy, if you use the C++11 standard (because there are a lot of additional includes like "utf8" , which solves this problem forever). But if you want to use

Node Buffers, from utf8 to binary

阅读更多关于 Node Buffers, from utf8 to binary

问题 I'm receiving data as utf8 from a source and this data was originally in binary form (it was a Buffer ). I have to convert back this data to a Buffer . I'm having a hard time figuring how to do this. Here's a small sample that shows my problem: var hexString = 'e61b08020304e61c09020304e61d0a020304e61e65'; var buffer1 = new Buffer(hexString, 'hex'); var str = buffer1.toString('utf8'); var buffer2 = new Buffer(str, 'utf8'); console.log('original content:', hexString); console.log('buffer1