mojibake

Russian symbols in Python output corrupted (ENCODING)

我的梦境 提交于 2021-02-07 19:40:19
问题 I parsed a HTML document and have Russian text in it. When I'm trying to print it in Python, I get this: ÐлÑбниÑнÑй новогодний пÑÐ½Ñ I tried to decode it and I get ISO-8859-1 encoding. I'm trying to decode it like that: print drink_name.decode('iso8859-1') But I get an error. How can I print this text, or encode it in Unicode? 回答1: You have a Mojibake; UTF-8 bytes decoded as Latin-1 or CP1251 in this case. You can repair it by reversing the process: >>> print u'ÐлÑбнÐ

output utf8 in console with Visual Studio (wide stream)

戏子无情 提交于 2021-01-29 04:11:45
问题 This piece of code works if i compiled it with mingw32 on windows 10. and emits right result, as you can see below : C:\prj\cd>bin\main.exe 1°à€3§4ç5@の,は,でした,象形字 ; Indeed when i try to compile it with Visual Studio 17, same code emits wrong chracters /out:prova.exe prova.obj C:\prj\cd>prova.exe 1°à €3§4ç5@ã®,ã¯,ã§ã—ãŸ,象形字 ; C:\prj\cd> here source code : #include <windows.h> #include <io.h> #include <fcntl.h> #include <stdio.h> #include <string> #include <iostream> int main ( void )

Encoding error with polish charset during transfer of database / server seting up

坚强是说给别人听的谎言 提交于 2020-01-22 02:18:07
问题 I am trying to transfer one of my databases from one host (home.pl) to another (my newly set server). The script that I am trying to transfer is wordpress. Unluckily irrespective of the method used I am struggling with encoding problems. New host configuration In my new server I am using the following directives in my.cnf: [mysql] default-character-set=utf8 [mysqld] collation-server = utf8_general_ci character-set-server = utf8 init_connect='SET collation_connection = utf8_general_ci' init

Encoding error with polish charset during transfer of database / server seting up

[亡魂溺海] 提交于 2020-01-22 02:18:06
问题 I am trying to transfer one of my databases from one host (home.pl) to another (my newly set server). The script that I am trying to transfer is wordpress. Unluckily irrespective of the method used I am struggling with encoding problems. New host configuration In my new server I am using the following directives in my.cnf: [mysql] default-character-set=utf8 [mysqld] collation-server = utf8_general_ci character-set-server = utf8 init_connect='SET collation_connection = utf8_general_ci' init

how to convert unicode text to utf8 text readable?

白昼怎懂夜的黑 提交于 2020-01-15 04:45:39
问题 I got a serious problem regarding Unicode and utf8, I saved a paragraph of Arabic/Persian text file into notepad and saved it, now I saw my information like Êæ Çíä ÓæÑÓ ÈÑäÇãå ÚÏÏ ÏáÎæÇåí Ñæ ÇÒ æÑæÏí ãííÑå æ Èå Øæá åãæä ÚÏÏ ãËáËí Ñæ ÑÓã ãí ˜äå my question is how to get back my data, it is important for me to get this data back, thanks in advance 回答1: The paragraph was scrambled by saving as code page 1256 (Arabic/Persian), then interpreted as code page 1252 (Western Europe), and finally saved

PHP Strange character before £ sign?

蓝咒 提交于 2019-12-31 04:36:07
问题 For some reason i get a  £76756687 weird character when i type a £ into a text field on my form? 回答1: As you suspect, it's a character encoding issue - is the page set to use a charset of UTF-8? (You can't go wrong with this encoding really.) Also, you'll probably want to entity encode the pound symbol on the way out ( £ ) As an example character set (for both the form page and HTML email) you could use: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> That said, is

Fixing encodings

醉酒当歌 提交于 2019-12-28 19:04:06
问题 I have ended up with messed up character encodings in one of our mysql columns. Typically I have √© instead of é √∂ instead of ö √≠ instead of í and so on... Fairly certain that someone here would know what happened and how to fix. UPDATE: Based on bobince's answer and since I had this data in a file I did the following #!/user/bin/env python import codecs f = codecs.open('./file.csv', 'r', 'utf-8') f2 = codecs.open('./file-fixed.csv', 'w', 'utf-8') for line in f: f2.write(line.encode(

Certain Arabic text gets incorrectly shown while other Arabic text gets showed normally?

别说谁变了你拦得住时间么 提交于 2019-12-24 00:54:33
问题 I'm developing an app with Arabic text in it.. My phone supports Arabic so the text gets displayed correctly.. the weird problem is that: if I copy an Arabic text that i want from a.txt file and put it into an EditText, the EditText displays weird characters, but if I write the SAME text manually (not copy-paste), the text gets displayed normally!! Here is a picture showing what I mean, the first EditText is the text I wrote manually, and the second is the text I copy-pasted from the .txt

Encoding issue of a character in utf-8

喜你入骨 提交于 2019-12-22 10:28:07
问题 I get a link from a web page by using beautiful soup library through a.get('href') . In the link there is a strange character ® but when I get it became ® . How can I encode it properly? I have already added at the beginning of the page # -*- coding: utf-8 -*- r = requests.get(url) soup = BeautifulSoup(r.text) 回答1: Do not use r.text ; leave decoding to BeautifulSoup : soup = BeautifulSoup(r.content) r.content gives you the response in bytes, without decoding. r.text on the other hand, is the

Convert unicode with utf-8 string as content to str

≯℡__Kan透↙ 提交于 2019-12-20 10:04:43
问题 I'm using pyquery to parse a page: dom = PyQuery('http://zh.wikipedia.org/w/index.php', {'title': 'CSS', 'printable': 'yes', 'variant': 'zh-cn'}) content = dom('#mw-content-text > p').eq(0).text() but what I get in content is a unicode string with utf-8 encoded content: u'\xe5\xb1\x82\xe5\x8f\xa0\xe6\xa0\xb7\xe5\xbc\x8f\xe8\xa1\xa8...' how could I convert it to str without lost the content? to make it clear: I want conent == '\xe5\xb1\x82\xe5\x8f\xa0\xe6\xa0\xb7\xe5\xbc\x8f\xe8\xa1\xa8' not