utf-8

How to best deal with Windows' 16-bit wchar_t ugliness?

拜拜、爱过 提交于 2020-01-10 04:28:06
问题 I'm writing a wrapper layer to be used with mingw which provides the application with a virtual UTF-8 environment. Functions which deal with filenames are wrappers which convert from UTF-8 and call the corresponding "_w" functions, and so on. The big problem I've run into is that Windows' wchar_t is 16-bit. For filesystem operations, it's not a big deal. I can just convert back and forth between UTF-8 and UTF-16, and everything will work. But the standard C multibyte/wide character conversion

Encoding binary data within XML: Are there better alternatives than base64?

笑着哭i 提交于 2020-01-10 02:53:33
问题 I want to encode and decode binary data within an XML file (with Python, but whatever). I have to face the fact that an XML tag content has illegal characters. The only allowed ones are described in XML specs: Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] Which means that the unallowed are: 29 Unicode control characters are illegal (0x00 - 0x20) ie ( 000xxxxx ) except 0x09, 0x0A, 0x0D Any Unicode character representation above 2 bytes (UTF-16+) is illegal (U

glob() can't find file names with multibyte characters on Windows?

前提是你 提交于 2020-01-09 13:01:41
问题 I'm writing a file manager and need to scan directories and deal with renaming files that may have multibyte characters. I'm working on it locally on Windows/Apache PHP 5.3.8, with the following file names in a directory: filename.jpg имяфайла.jpg file件name.jpg פילענאַמע.jpg 文件名.jpg Testing on a live UNIX server woked fine. Testing locally on Windows using glob('./path/*') returns only the first one, filename.jpg . Using scandir() , the correct number of files is returned at least, but I get

Which file encodings are supported for Python 3 source files?

强颜欢笑 提交于 2020-01-09 11:49:31
问题 Before you go telling me to read PEP 0263, keep reading... I can't find any documentation that details which file encodings are supported for Python 3 source files . I've found hundreds (thousands?) of questions, answers, posts, emails, etc. about how to declare - at the top of your source file - the encoding of that source file, but none of them answer my question. Bear with me and imagine doing (or actually try) the following: Open Notepad (I'm using regular old Notepad on Windows 7, but I

Has anyone been able to write out UTF-8 characters using python's xlwt?

北战南征 提交于 2020-01-09 11:12:06
问题 I'm trying to write data to an excel file that includes Japanese characters. I'm using codec.open() to get the data, and that seems to work fine, but I run into this error when I try to write the data: UnicodeEncodeError: 'ascii' codec can't encode characters in position 16-17: ordinal not in range(128) I don't understand why the program would be insisting on using ascii here. When I created a new workbook object, I did so using wb = xlwt.Workbook(encoding='utf-8') and both the program file

Can not send special characters (UTF-8) from JSP to Servlet: question marks displayed

痴心易碎 提交于 2020-01-09 10:51:42
问题 I have a problem sending special characters like cyrillic or umlauts from a jsp to a servlet. I would greatly appreciate your help here. Here is what I have done: Defined the utf-8 charset in the jsp: <%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="utf-8"%> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> ... <div class=

Can not send special characters (UTF-8) from JSP to Servlet: question marks displayed

妖精的绣舞 提交于 2020-01-09 10:50:46
问题 I have a problem sending special characters like cyrillic or umlauts from a jsp to a servlet. I would greatly appreciate your help here. Here is what I have done: Defined the utf-8 charset in the jsp: <%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="utf-8"%> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> ... <div class=

Can not send special characters (UTF-8) from JSP to Servlet: question marks displayed

别等时光非礼了梦想. 提交于 2020-01-09 10:50:13
问题 I have a problem sending special characters like cyrillic or umlauts from a jsp to a servlet. I would greatly appreciate your help here. Here is what I have done: Defined the utf-8 charset in the jsp: <%@ page language="java" contentType="text/html; charset=utf-8" pageEncoding="utf-8"%> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> ... <div class=

c++ how to write/read ofstream in unicode / utf8

十年热恋 提交于 2020-01-09 10:19:04
问题 I have UTF-8 text file , that I'm reading using simple : ifstream in("test.txt"); Now I'd like to create a new file that will be UTF-8 encoding or Unicode. How can I do this with ofstream or other? This creates ansi Encoding. ofstream out(fileName.c_str(), ios::out | ios::app | ios::binary); 回答1: Ok, about the portable variant. It is easy, if you use the C++11 standard (because there are a lot of additional includes like "utf8" , which solves this problem forever). But if you want to use

Node Buffers, from utf8 to binary

断了今生、忘了曾经 提交于 2020-01-09 10:10:14
问题 I'm receiving data as utf8 from a source and this data was originally in binary form (it was a Buffer ). I have to convert back this data to a Buffer . I'm having a hard time figuring how to do this. Here's a small sample that shows my problem: var hexString = 'e61b08020304e61c09020304e61d0a020304e61e65'; var buffer1 = new Buffer(hexString, 'hex'); var str = buffer1.toString('utf8'); var buffer2 = new Buffer(str, 'utf8'); console.log('original content:', hexString); console.log('buffer1