python-unicode

Display width of unicode strings in Python [duplicate]

吃可爱长大的小学妹 提交于 2019-11-28 08:01:38
问题 This question already has answers here : Normalizing Unicode (2 answers) Closed 5 years ago . How can I determine the display width of a Unicode string in Python 3.x, and is there a way to use that information to align those strings with str.format() ? Motivating example: Printing a table of strings to the console. Some of the strings contain non-ASCII characters. >>> for title in d.keys(): >>> print("{:<20} | {}".format(title, d[title])) zootehni- | zooteh. zootekni- | zootek. zoothèque |

UnicodeEncodeError: 'ascii' codec can't encode character u'\\xe9' in position 7: ordinal not in range(128) [duplicate]

谁说胖子不能爱 提交于 2019-11-28 04:30:13
This question already has an answer here: UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128) 27 answers I have this code: printinfo = title + "\t" + old_vendor_id + "\t" + apple_id + '\n' # Write file f.write (printinfo + '\n') But I get this error when running it: f.write(printinfo + '\n') UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128) It's having toruble writing out this: Identité secrète (Abduction) [VF] Any ideas please, not sure how to fix. Cheers. UPDATE: This is the bulk of

Open() and codecs.open() in Python 2.7 behave strangely different

吃可爱长大的小学妹 提交于 2019-11-28 04:03:34
问题 I have a text file with first line of unicode characters and all other lines in ASCII. I try to read the first line as one variable, and all other lines as another. However, when I use the following code: # -*- coding: utf-8 -*- import codecs import os filename = '1.txt' f = codecs.open(filename, 'r3', encoding='utf-8') print f names_f = f.readline().split(' ') data_f = f.readlines() print len(names_f) print len(data_f) f.close() print 'And now for something completely differerent:' g = open

Python string to unicode [duplicate]

感情迁移 提交于 2019-11-28 03:11:33
Possible Duplicate: How do I treat an ASCII string as unicode and unescape the escaped characters in it in python? How do convert unicode escape sequences to unicode characters in a python string I have a string that contains unicode characters e.g. \u2026 etc. Somehow it is not received to me as unicode , but is received as a str . How do I convert it back to unicode? >>> a="Hello\u2026" >>> b=u"Hello\u2026" >>> print a Hello\u2026 >>> print b Hello… >>> print unicode(a) Hello\u2026 >>> So clearly unicode(a) is not the answer. Then what is? Unicode escapes only work in unicode strings, so

Python to show special characters

岁酱吖の 提交于 2019-11-28 01:35:52
问题 I know there are tons of threads regarding this issue but I have not managed to find one which solves my problem. I am trying to print a string but when printed it doesn't show special characters (e.g. æ, ø, å, ö and ü). When I print the string using repr() this is what I get: u'Von D\xc3\xbc' and u'\xc3\x96berg' Does anyone know how I can convert this to Von Dü and Öberg ? It's important to me that these characters are not ignored, e.g. myStr.encode("ascii", "ignore") . EDIT This is the code

Get non-ASCII filename from S3 notification event in Lambda

跟風遠走 提交于 2019-11-28 01:20:23
The key field in an AWS S3 notification event, which denotes the filename, is URL escaped. This is evident when the filename contains spaces or non-ASCII characters. For example, I have upload the following filename to S3: my file řěąλλυ.txt The notification is received as: { "Records": [ "s3": { "object": { "key": u"my+file+%C5%99%C4%9B%C4%85%CE%BB%CE%BB%CF%85.txt" } } ] } I've tried to decode using: key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key']).decode('utf-8') but that yields: my file ÅÄÄλλÏ.txt Of course, when I then try to get the file from S3 using Boto, I get a

how to deal with ® in url for urllib2.urlopen?

╄→尐↘猪︶ㄣ 提交于 2019-11-27 15:54:27
I received a url: https://www.packtpub.com/virtualization-and-cloud/citrix-xenapp ®-75-desktop-virtualization-solutions; it is from BeautifulSoup. url=u'https://www.packtpub.com/virtualization-and-cloud/citrix-xenapp\xae-75-desktop-virtualization-solutions' I want to feed back into urllib2.urlopen again. import urllib2 source = urllib2.urlopen(url).read() The error I get: UnicodeEncodeError: 'gbk' codec can't encode character u'\xae' in position 43: illegal multibyte sequence Thus, I tried: source = urllib2.urlopen(url.encode("utf-8")).read() It got page source, however it is different from

Google App Engine: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 48: ordinal not in range(128)

江枫思渺然 提交于 2019-11-27 15:34:18
问题 I'm working on a small application using Google App Engine which makes use of the Quora RSS feed. There is a form, and based on the input entered by the user, it will output a list of links related to the input. Now, the applications works fine for one letter queries and most of two-letter words if the words are separated by a '-'. However, for three-letter words and some two-letter words, I get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 48:

Python returns length of 2 for single Unicode character string

◇◆丶佛笑我妖孽 提交于 2019-11-27 09:07:57
In Python 2.7: In [2]: utf8_str = '\xf0\x9f\x91\x8d' In [3]: print(utf8_str) 👍 In [4]: unicode_str = utf8_str.decode('utf-8') In [5]: print(unicode_str) 👍 In [6]: unicode_str Out[6]: u'\U0001f44d' In [7]: len(unicode_str) Out[7]: 2 Since unicode_str only contains a single unicode code point (0x0001f44d), why does len(unicode_str) return 2 instead of 1? Your Python binary was compiled with UCS-2 support (a narrow build) and internally anything outside of the BMP (Basic Multilingual Plane) is represented using a surrogate pair . That means such codepoints show up as 2 characters when asking for

UnicodeDecodeError: ('utf-8' codec) while reading a csv file [duplicate]

别等时光非礼了梦想. 提交于 2019-11-27 06:50:24
This question already has an answer here: UnicodeDecodeError when reading CSV file in Pandas with Python 11 answers what i am trying is reading a csv to make a dataframe---making changes in a column---again updating/reflecting changed value into same csv(to_csv)- again trying to read that csv to make another dataframe...there i am getting an error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 7: invalid continuation byte my code is import pandas as pd df = pd.read_csv("D:\ss.csv") df.columns #o/p is Index(['CUSTOMER_MAILID', 'False', 'True'], dtype='object') df['True'] =