python-unicode | 易学教程

Display width of unicode strings in Python [duplicate]

阅读更多关于 Display width of unicode strings in Python [duplicate]

问题 This question already has answers here : Normalizing Unicode (2 answers) Closed 5 years ago . How can I determine the display width of a Unicode string in Python 3.x, and is there a way to use that information to align those strings with str.format() ? Motivating example: Printing a table of strings to the console. Some of the strings contain non-ASCII characters. >>> for title in d.keys(): >>> print("{:<20} | {}".format(title, d[title])) zootehni- | zooteh. zootekni- | zootek. zoothèque |

UnicodeEncodeError: 'ascii' codec can't encode character u'\\xe9' in position 7: ordinal not in range(128) [duplicate]

阅读更多关于 UnicodeEncodeError: 'ascii' codec can't encode character u'\\xe9' in position 7: ordinal not in range(128) [duplicate]

This question already has an answer here: UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128) 27 answers I have this code: printinfo = title + "\t" + old_vendor_id + "\t" + apple_id + '\n' # Write file f.write (printinfo + '\n') But I get this error when running it: f.write(printinfo + '\n') UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128) It's having toruble writing out this: Identité secrète (Abduction) [VF] Any ideas please, not sure how to fix. Cheers. UPDATE: This is the bulk of

Open() and codecs.open() in Python 2.7 behave strangely different

阅读更多关于 Open() and codecs.open() in Python 2.7 behave strangely different

问题 I have a text file with first line of unicode characters and all other lines in ASCII. I try to read the first line as one variable, and all other lines as another. However, when I use the following code: # -*- coding: utf-8 -*- import codecs import os filename = '1.txt' f = codecs.open(filename, 'r3', encoding='utf-8') print f names_f = f.readline().split(' ') data_f = f.readlines() print len(names_f) print len(data_f) f.close() print 'And now for something completely differerent:' g = open

Python string to unicode [duplicate]

阅读更多关于 Python string to unicode [duplicate]

Possible Duplicate: How do I treat an ASCII string as unicode and unescape the escaped characters in it in python? How do convert unicode escape sequences to unicode characters in a python string I have a string that contains unicode characters e.g. \u2026 etc. Somehow it is not received to me as unicode , but is received as a str . How do I convert it back to unicode? >>> a="Hello\u2026" >>> b=u"Hello\u2026" >>> print a Hello\u2026 >>> print b Hello… >>> print unicode(a) Hello\u2026 >>> So clearly unicode(a) is not the answer. Then what is? Unicode escapes only work in unicode strings, so

Python to show special characters

阅读更多关于 Python to show special characters

问题 I know there are tons of threads regarding this issue but I have not managed to find one which solves my problem. I am trying to print a string but when printed it doesn't show special characters (e.g. æ, ø, å, ö and ü). When I print the string using repr() this is what I get: u'Von D\xc3\xbc' and u'\xc3\x96berg' Does anyone know how I can convert this to Von Dü and Öberg ? It's important to me that these characters are not ignored, e.g. myStr.encode("ascii", "ignore") . EDIT This is the code

Get non-ASCII filename from S3 notification event in Lambda

阅读更多关于 Get non-ASCII filename from S3 notification event in Lambda

The key field in an AWS S3 notification event, which denotes the filename, is URL escaped. This is evident when the filename contains spaces or non-ASCII characters. For example, I have upload the following filename to S3: my file řěąλλυ.txt The notification is received as: { "Records": [ "s3": { "object": { "key": u"my+file+%C5%99%C4%9B%C4%85%CE%BB%CE%BB%CF%85.txt" } } ] } I've tried to decode using: key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key']).decode('utf-8') but that yields: my file ÅÄÄÎ»Î»Ï.txt Of course, when I then try to get the file from S3 using Boto, I get a

how to deal with ® in url for urllib2.urlopen?

阅读更多关于 how to deal with ® in url for urllib2.urlopen?

I received a url: https://www.packtpub.com/virtualization-and-cloud/citrix-xenapp ®-75-desktop-virtualization-solutions; it is from BeautifulSoup. url=u'https://www.packtpub.com/virtualization-and-cloud/citrix-xenapp\xae-75-desktop-virtualization-solutions' I want to feed back into urllib2.urlopen again. import urllib2 source = urllib2.urlopen(url).read() The error I get: UnicodeEncodeError: 'gbk' codec can't encode character u'\xae' in position 43: illegal multibyte sequence Thus, I tried: source = urllib2.urlopen(url.encode("utf-8")).read() It got page source, however it is different from

Google App Engine: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 48: ordinal not in range(128)

阅读更多关于 Google App Engine: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 48: ordinal not in range(128)

问题 I'm working on a small application using Google App Engine which makes use of the Quora RSS feed. There is a form, and based on the input entered by the user, it will output a list of links related to the input. Now, the applications works fine for one letter queries and most of two-letter words if the words are separated by a '-'. However, for three-letter words and some two-letter words, I get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 48:

Python returns length of 2 for single Unicode character string

阅读更多关于 Python returns length of 2 for single Unicode character string

In Python 2.7: In [2]: utf8_str = '\xf0\x9f\x91\x8d' In [3]: print(utf8_str) 👍 In [4]: unicode_str = utf8_str.decode('utf-8') In [5]: print(unicode_str) 👍 In [6]: unicode_str Out[6]: u'\U0001f44d' In [7]: len(unicode_str) Out[7]: 2 Since unicode_str only contains a single unicode code point (0x0001f44d), why does len(unicode_str) return 2 instead of 1? Your Python binary was compiled with UCS-2 support (a narrow build) and internally anything outside of the BMP (Basic Multilingual Plane) is represented using a surrogate pair . That means such codepoints show up as 2 characters when asking for

UnicodeDecodeError: ('utf-8' codec) while reading a csv file [duplicate]

阅读更多关于 UnicodeDecodeError: ('utf-8' codec) while reading a csv file [duplicate]

This question already has an answer here: UnicodeDecodeError when reading CSV file in Pandas with Python 11 answers what i am trying is reading a csv to make a dataframe---making changes in a column---again updating/reflecting changed value into same csv(to_csv)- again trying to read that csv to make another dataframe...there i am getting an error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe7 in position 7: invalid continuation byte my code is import pandas as pd df = pd.read_csv("D:\ss.csv") df.columns #o/p is Index(['CUSTOMER_MAILID', 'False', 'True'], dtype='object') df['True'] =