mojibake | 易学教程

Russian symbols in Python output corrupted (ENCODING)

阅读更多关于 Russian symbols in Python output corrupted (ENCODING)

问题 I parsed a HTML document and have Russian text in it. When I'm trying to print it in Python, I get this: ÐÐ»ÑÐ±Ð½Ð¸ÑÐ½ÑÐ¹ Ð½Ð¾Ð²Ð¾Ð³Ð¾Ð´Ð½Ð¸Ð¹ Ð¿ÑÐ½Ñ I tried to decode it and I get ISO-8859-1 encoding. I'm trying to decode it like that: print drink_name.decode('iso8859-1') But I get an error. How can I print this text, or encode it in Unicode? 回答1: You have a Mojibake; UTF-8 bytes decoded as Latin-1 or CP1251 in this case. You can repair it by reversing the process: >>> print u'ÐÐ»ÑÐ±Ð½Ð

output utf8 in console with Visual Studio (wide stream)

阅读更多关于 output utf8 in console with Visual Studio (wide stream)

问题 This piece of code works if i compiled it with mingw32 on windows 10. and emits right result, as you can see below : C:\prj\cd>bin\main.exe 1°à€3§4ç5@の,は,でした,象形字 ; Indeed when i try to compile it with Visual Studio 17, same code emits wrong chracters /out:prova.exe prova.obj C:\prj\cd>prova.exe 1Â°Ã â‚¬3Â§4Ã§5@ã®,ã¯,ã§ã—ãŸ,è±¡å½¢å— ; C:\prj\cd> here source code : #include <windows.h> #include <io.h> #include <fcntl.h> #include <stdio.h> #include <string> #include <iostream> int main ( void )

Encoding error with polish charset during transfer of database / server seting up

阅读更多关于 Encoding error with polish charset during transfer of database / server seting up

问题 I am trying to transfer one of my databases from one host (home.pl) to another (my newly set server). The script that I am trying to transfer is wordpress. Unluckily irrespective of the method used I am struggling with encoding problems. New host configuration In my new server I am using the following directives in my.cnf: [mysql] default-character-set=utf8 [mysqld] collation-server = utf8_general_ci character-set-server = utf8 init_connect='SET collation_connection = utf8_general_ci' init

Encoding error with polish charset during transfer of database / server seting up

阅读更多关于 Encoding error with polish charset during transfer of database / server seting up

how to convert unicode text to utf8 text readable?

阅读更多关于 how to convert unicode text to utf8 text readable?

问题 I got a serious problem regarding Unicode and utf8, I saved a paragraph of Arabic/Persian text file into notepad and saved it, now I saw my information like Êæ Çíä ÓæÑÓ ÈÑäÇãå ÚÏÏ ÏáÎæÇåí Ñæ ÇÒ æÑæÏí ãííÑå æ Èå Øæá åãæä ÚÏÏ ãËáËí Ñæ ÑÓã ãí ˜äå my question is how to get back my data, it is important for me to get this data back, thanks in advance 回答1: The paragraph was scrambled by saving as code page 1256 (Arabic/Persian), then interpreted as code page 1252 (Western Europe), and finally saved

PHP Strange character before £ sign?

阅读更多关于 PHP Strange character before £ sign?

问题 For some reason i get a Â £76756687 weird character when i type a £ into a text field on my form? 回答1: As you suspect, it's a character encoding issue - is the page set to use a charset of UTF-8? (You can't go wrong with this encoding really.) Also, you'll probably want to entity encode the pound symbol on the way out ( £ ) As an example character set (for both the form page and HTML email) you could use: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> That said, is

Fixing encodings

阅读更多关于 Fixing encodings

问题 I have ended up with messed up character encodings in one of our mysql columns. Typically I have √© instead of é √∂ instead of ö √≠ instead of í and so on... Fairly certain that someone here would know what happened and how to fix. UPDATE: Based on bobince's answer and since I had this data in a file I did the following #!/user/bin/env python import codecs f = codecs.open('./file.csv', 'r', 'utf-8') f2 = codecs.open('./file-fixed.csv', 'w', 'utf-8') for line in f: f2.write(line.encode(

Certain Arabic text gets incorrectly shown while other Arabic text gets showed normally?

阅读更多关于 Certain Arabic text gets incorrectly shown while other Arabic text gets showed normally?

问题 I'm developing an app with Arabic text in it.. My phone supports Arabic so the text gets displayed correctly.. the weird problem is that: if I copy an Arabic text that i want from a.txt file and put it into an EditText, the EditText displays weird characters, but if I write the SAME text manually (not copy-paste), the text gets displayed normally!! Here is a picture showing what I mean, the first EditText is the text I wrote manually, and the second is the text I copy-pasted from the .txt

Encoding issue of a character in utf-8

阅读更多关于 Encoding issue of a character in utf-8

问题 I get a link from a web page by using beautiful soup library through a.get('href') . In the link there is a strange character ® but when I get it became Â® . How can I encode it properly? I have already added at the beginning of the page # -*- coding: utf-8 -*- r = requests.get(url) soup = BeautifulSoup(r.text) 回答1: Do not use r.text ; leave decoding to BeautifulSoup : soup = BeautifulSoup(r.content) r.content gives you the response in bytes, without decoding. r.text on the other hand, is the

Convert unicode with utf-8 string as content to str

阅读更多关于 Convert unicode with utf-8 string as content to str

问题 I'm using pyquery to parse a page: dom = PyQuery('http://zh.wikipedia.org/w/index.php', {'title': 'CSS', 'printable': 'yes', 'variant': 'zh-cn'}) content = dom('#mw-content-text > p').eq(0).text() but what I get in content is a unicode string with utf-8 encoded content: u'\xe5\xb1\x82\xe5\x8f\xa0\xe6\xa0\xb7\xe5\xbc\x8f\xe8\xa1\xa8...' how could I convert it to str without lost the content? to make it clear: I want conent == '\xe5\xb1\x82\xe5\x8f\xa0\xe6\xa0\xb7\xe5\xbc\x8f\xe8\xa1\xa8' not