encoding | 易学教程

std::wstring length

阅读更多关于 std::wstring length

问题 What is the result of std::wstring.length() function, the length in wchar_t(s) or the length in symbols? And why? TCHAR r2[3]; r2[0] = 0xD834; // D834, DD1E - musical G clef r2[1] = 0xDD1E; // r2[2] = 0x0000; // '/0' std::wstring r = r2; std::cout << "capacity: " << r.capacity() << std::endl; std::cout << "length: " << r.length() << std::endl; std::cout << "size: " << r.size() << std::endl; std::cout << "max_size: " << r.max_size() << std::endl; Output> capacity: 351 length: 2 size: 2 max

How to encode and decode from spanish in python

阅读更多关于 How to encode and decode from spanish in python

问题 I have the following code written in python 2.7 # -*- coding: utf-8 -*- import sys _string = "años luz detrás" print _string.encode("utf-8") this throws the following error: print _string.encode("utf-8") UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128) Any help appreciated, thanks in advance 回答1: Add u before the " >>> _string = u"años luz detrás" >>> print _string.encode("utf-8") años luz detrás This would do. 回答2: In Python 2 a string literal

How can I get XSLT to return UTF-8 in Java

阅读更多关于 How can I get XSLT to return UTF-8 in Java

问题 I'm trying to get my XSL script to work with UTF-8 encoding. Characters like åäö and greek characters just turn up like garbage. The only way to get it to work is if I write the result to a file. If I write it to an output stream it only returns garbage (System.out works, but that might be because its beeing redirected to a file). The result needs to be returned from a servlet, and please note that its not a servlet configuration issue. I can return a hard coded string with greek characters

Unicode Encode Error in Sublime Text 3 console

阅读更多关于 Unicode Encode Error in Sublime Text 3 console

问题 I’m always getting a 'UnicodeEncodeError: 'ascii' codec can't encode character '\xf6' in position 0: ordinal not in range(128)' in the sublime text 3 console when trying to print a non-ascii character. I'm using Anaconda Python Builder to build the system. Building the system with the built-in “python” runs perfectly (i.e. prints out non-ascii characters fine) and also running the script from the terminal works fine (i'm running the script on Mac OS Sierra). I assume the problem must have

UTF-8 encoding problems with R

阅读更多关于 UTF-8 encoding problems with R

问题 Trying to parse Senate statements from the Mexican Senate, but having trouble with UTF-8 encodings of the web page. This html comes through clearly: library(rvest) Senate<-html("http://comunicacion.senado.gob.mx/index.php/informacion/versiones/19675-version-estenografica-de-la-reunion-ordinaria-de-las-comisiones-unidas-de-puntos-constitucionales-de-anticorrupcion-y-participacion-ciudadana-y-de-estudios-legislativos-segunda.html") Here is an example of a bit of the webpage: "CONTINÚA EL

Encoding problem (UTF-8) in PHP

阅读更多关于 Encoding problem (UTF-8) in PHP

问题 I want to output the following string in PHP: ä ö ü ß € Therefore, I've encoded it to utf8 manually: Ã¤ Ã¶ Ã¼ ÃŸ Â€ So my script is: <?php header('content-type: text/html; charset=utf-8'); echo 'Ã¤ Ã¶ Ã¼ ÃŸ Â€'; ?> The first 4 characters are correct (ä ö ü ß) but unfortunately the € sign isn't correct: ä ö ü ß Here you can see it. Can you tell me what I've done wrong? My editor (Notepad++) has settings for Encoding (Ansi/UTF-8) and Format (Windows/Unix). Do I have to change them? I hope you

UnicodeDecodeError: 'utf-8' codec can't decode byte error

阅读更多关于 UnicodeDecodeError: 'utf-8' codec can't decode byte error

问题 I'm trying to get a response from urllib and decode it to a readable format. The text is in Hebrew and also contains characters like { and / top page coding is: # -*- coding: utf-8 -*- raw string is: b'\xff\xfe{\x00 \x00\r\x00\n\x00"\x00i\x00d\x00"\x00 \x00:\x00 \x00"\x001\x004\x000\x004\x008\x003\x000\x000\x006\x004\x006\x009\x006\x00"\x00,\x00\r\x00\n\x00"\x00t\x00i\x00t\x00l\x00e\x00"\x00 \x00:\x00 \x00"\x00\xe4\x05\xd9\x05\xe7\x05\xd5\x05\xd3\x05 \x00\xd4\x05\xe2\x05\xd5\x05\xe8\x05\xe3

How to convert utf-8 fancy quotes to neutral quotes

阅读更多关于 How to convert utf-8 fancy quotes to neutral quotes

问题 I'm writing a little Python script that parses word docs and writes to a csv file. However, some of the docs have some utf-8 characters that my script can't process correctly. Fancy quotes show up quite often (u'\u201c'). Is there a quick and easy (and smart) way of replacing those with the neutral ascii-supported quotes, so I can just write line.encode('ascii') to the csv file? I have tried to find the left quote and replace it: val = line.find(u'\u201c') if val >= 0: line[val] = '"' But to

Encoding for Multilingual .py Files

阅读更多关于 Encoding for Multilingual .py Files

问题 I am writing a .py file that contains strings from multiple charactersets, including English, Spanish, and Russian. For example, I have something like: string_en = "The quick brown fox jumped over the lazy dog." string_es = "El veloz murciélago hindú comía feliz cardillo y kiwi." string_ru = "В чащах юга жил бы цитрус? Да, но фальшивый экземпляр!" I am having trouble figuring out how to encode my file to avoid generating syntax errors like the one below when my file is run: SyntaxError: Non

Decoding if it's not unicode

阅读更多关于 Decoding if it's not unicode

问题 I want my function to take an argument that could be an unicode object or a utf-8 encoded string. Inside my function, I want to convert the argument to unicode. I have something like this: def myfunction(text): if not isinstance(text, unicode): text = unicode(text, 'utf-8') ... Is it possible to avoid the use of isinstance? I was looking for something more duck-typing friendly. During my experiments with decoding, I have run into several weird behaviours of Python. For instance: >>> u'hello'