character-encoding

Converting character encoding within c++

为君一笑 提交于 2021-01-28 12:32:13
问题 I have a website which allows users to input usernames. The problem here is that the code in c++ assumes the browser encoding is Western Europe and converts the string received from the username text box into unicode to compare with string stored within the databasse. with the right browser encoding set the character úser is recieved as %FAser and coverted properly to úser within the program however with the browser settings set to UTF-8 the string is recieved as %C3%BAser and then converted

SEO Canonical URL in Greek characters

人盡茶涼 提交于 2021-01-28 12:03:47
问题 I have a URL which including Greek letters http://www.mydomanain.com/gr/τιτλος-σελιδας/20/ I am using $_SERVER['REQUEST_URI'] to insert value to canonical link in my page head like this <link rel="canonical" href="http://www.mydomanain.com<?php echo $_SERVER['REQUEST_URI']; ?>" /> The problem is when I am viewing the page source the URL is displayed with characters like ...CE%B3%CE%B3%CE%B5%CE%BB... but when clicking on it, its display the link as it should be Is this will caused any penalty

Reading proper unicode characters into a ReadStream in node.js

孤者浪人 提交于 2021-01-28 11:47:49
问题 Sometimes strange things happen in the world of coding, and I have no explanation at all. :) A text file I have contains the following lines: en …π 1 1 en Œ® 1 1 en Œ© 1 1 en –° 1 1 en —† 1 1 en “§ 1 1 en ◊° 2 2 en ·∏§anƒ´f 1 1 en ·π_ 1 1 en ˝mage:whiteshark-tgoss1.jpg 4 4 en ˝stanbul 114 114 My code is as follows: var fileReadStream = fs.createReadStream(fileName, {encoding: 'utf8'}); fileReadStream.on('data', function(data){ //do something with the data }); When I look at the data element,

Reading proper unicode characters into a ReadStream in node.js

孤街醉人 提交于 2021-01-28 11:39:01
问题 Sometimes strange things happen in the world of coding, and I have no explanation at all. :) A text file I have contains the following lines: en …π 1 1 en Œ® 1 1 en Œ© 1 1 en –° 1 1 en —† 1 1 en “§ 1 1 en ◊° 2 2 en ·∏§anƒ´f 1 1 en ·π_ 1 1 en ˝mage:whiteshark-tgoss1.jpg 4 4 en ˝stanbul 114 114 My code is as follows: var fileReadStream = fs.createReadStream(fileName, {encoding: 'utf8'}); fileReadStream.on('data', function(data){ //do something with the data }); When I look at the data element,

Dealing with char values over 127 in C

醉酒当歌 提交于 2021-01-28 04:51:25
问题 I'm quite new to C programming, and I have some problems trying to assign a value over 127 (0x7F) in a char array. In my program, I work with generic binary data and I don't face any problem printing a previously acquired byte stream (e.g. with fopen or fgets, then processed with some bitwise operations) as %c or %d. But if I try to print a character from its numerical value like this: printf("%c\n", 128); it just prints FFFD (the replacement character). Here is another example: char abc[] =

Python 3 itertools.islice continue despite UnicodeDecodeError

拜拜、爱过 提交于 2021-01-28 04:01:02
问题 I have a python 3 program that monitors a log file. The log includes, among other things, chat messages written by users. The log is created by a third party application which I cannot change. Today a user wrote "텋��텋��" and it caused the program to crash with the following error: future: <Task finished coro=<updateConsoleLog() done, defined at /usr/local/src/bserver/logmonitor.py:48> exception=UnicodeDecodeError('utf-8',... say "\xed\xa0\xbd\xed\xb1\x8c"\r\n', 7623, 7624, 'invalid

Trailing equal signs (=) in emails

痞子三分冷 提交于 2021-01-28 00:30:55
问题 I download messages from a Gmail account using POP3 and save them in a SQLite database for futher processing: mailbox = poplib.POP3_SSL('pop.gmail.com', '995') mailbox.user(user) mailbox.pass_(password) msgnum = mailbox.stat()[0] for i in range(msgnum): msg = '\n'.join(mailbox.retr(i+1)[1]) save_message(msg, dbmgr) mailbox.quit() However, looking in the database, all lines but the last one of the message body (payload) have trailing equal signs. Do you know why this happens? 回答1: Frederic's

Python 3 itertools.islice continue despite UnicodeDecodeError

扶醉桌前 提交于 2021-01-28 00:01:03
问题 I have a python 3 program that monitors a log file. The log includes, among other things, chat messages written by users. The log is created by a third party application which I cannot change. Today a user wrote "텋��텋��" and it caused the program to crash with the following error: future: <Task finished coro=<updateConsoleLog() done, defined at /usr/local/src/bserver/logmonitor.py:48> exception=UnicodeDecodeError('utf-8',... say "\xed\xa0\xbd\xed\xb1\x8c"\r\n', 7623, 7624, 'invalid

Python urllib.request.urlopen: AttributeError: 'bytes' object has no attribute 'data'

大憨熊 提交于 2021-01-27 22:13:58
问题 I am using Python 3 and trying to connect to dstk . I am getting an error with urllib package. I researched a lot on SO and could not find anything similar to this problem. api_url = self.api_base+'/street2coordinates' api_body = json.dumps(addresses) #api_url=api_url.encode("utf-8") #api_body=api_body.encode("utf-8") print(type(api_url)) response_string = six.moves.urllib.request.urlopen(api_url, api_body).read() response = json.loads(response_string) If I do not encode the api_url and api

MySQL - select first 10 bytes of a string

混江龙づ霸主 提交于 2021-01-27 21:28:38
问题 Hello wise men & women, How would you select the first x bytes of a string? The use case: I'm optimizing product description texts for upload to Amazon, and Amazon measures field lengths by bytes in utf8 (not latin1 as I stated earlier), not by characters. MySQL on the other hand, seems to operate character-based. (e.g., the function left() is character-based, not byte-based). The difference (using English, French, Spanish & German) is roughly 10%, but it can vary widely. Some tests