python-unicode

Python string to unicode [duplicate]

别来无恙 提交于 2019-11-27 05:04:24
问题 This question already has answers here : Closed 7 years ago . Possible Duplicate: How do I treat an ASCII string as unicode and unescape the escaped characters in it in python? How do convert unicode escape sequences to unicode characters in a python string I have a string that contains unicode characters e.g. \u2026 etc. Somehow it is not received to me as unicode , but is received as a str . How do I convert it back to unicode? >>> a="Hello\u2026" >>> b=u"Hello\u2026" >>> print a Hello

Python 3: os.walk() file paths UnicodeEncodeError: 'utf-8' codec can't encode: surrogates not allowed

ε祈祈猫儿з 提交于 2019-11-27 01:57:31
问题 This code: for root, dirs, files in os.walk('.'): print(root) Gives me this error: UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 27: surrogates not allowed How do I walk through a file tree without getting toxic strings like this? 回答1: On Linux, filenames are 'just a bunch of bytes', and are not necessarily encoded in a particular encoding. Python 3 tries to turn everything into Unicode strings. In doing so the developers came up with a scheme to translate byte

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128) [duplicate]

杀马特。学长 韩版系。学妹 提交于 2019-11-27 00:27:20
问题 This question already has an answer here: UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128) 27 answers I have this code: printinfo = title + "\t" + old_vendor_id + "\t" + apple_id + '\n' # Write file f.write (printinfo + '\n') But I get this error when running it: f.write(printinfo + '\n') UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 7: ordinal not in range(128) It's having toruble writing out this:

Get non-ASCII filename from S3 notification event in Lambda

筅森魡賤 提交于 2019-11-26 23:30:50
问题 The key field in an AWS S3 notification event, which denotes the filename, is URL escaped. This is evident when the filename contains spaces or non-ASCII characters. For example, I have upload the following filename to S3: my file řěąλλυ.txt The notification is received as: { "Records": [ "s3": { "object": { "key": u"my+file+%C5%99%C4%9B%C4%85%CE%BB%CE%BB%CF%85.txt" } } ] } I've tried to decode using: key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key']).decode('utf-8') but

Python default string encoding

做~自己de王妃 提交于 2019-11-26 22:11:15
问题 When, where and how does Python implicitly apply encodings to strings or does implicit transcodings (conversions)? And what those "default" (i.e. implied) encodings are? For example, what are the encodings: of string literals? s = "Byte string with national characters" us = u"Unicode string with national characters" of byte strings when type-converted to and from Unicode? data = unicode(random_byte_string) when byte- and Unicode strings are written to/from a file or a terminal? print(open(

how to deal with ® in url for urllib2.urlopen?

杀马特。学长 韩版系。学妹 提交于 2019-11-26 17:25:21
问题 I received a url: https://www.packtpub.com/virtualization-and-cloud/citrix-xenapp®-75-desktop-virtualization-solutions; it is from BeautifulSoup. url=u'https://www.packtpub.com/virtualization-and-cloud/citrix-xenapp\xae-75-desktop-virtualization-solutions' I want to feed back into urllib2.urlopen again. import urllib2 source = urllib2.urlopen(url).read() The error I get: UnicodeEncodeError: 'gbk' codec can't encode character u'\xae' in position 43: illegal multibyte sequence Thus, I tried:

Python returns length of 2 for single Unicode character string

霸气de小男生 提交于 2019-11-26 14:30:59
问题 In Python 2.7: In [2]: utf8_str = '\xf0\x9f\x91\x8d' In [3]: print(utf8_str) 👍 In [4]: unicode_str = utf8_str.decode('utf-8') In [5]: print(unicode_str) 👍 In [6]: unicode_str Out[6]: u'\U0001f44d' In [7]: len(unicode_str) Out[7]: 2 Since unicode_str only contains a single unicode code point (0x0001f44d), why does len(unicode_str) return 2 instead of 1? 回答1: Your Python binary was compiled with UCS-2 support (a narrow build) and internally anything outside of the BMP (Basic Multilingual Plane)

Removing unicode \\u2026 like characters in a string in python2.7

流过昼夜 提交于 2019-11-26 11:47:41
I have a string in python2.7 like this, This is some \u03c0 text that has to be cleaned\u2026! it\u0027s annoying! How do i convert it to this, This is some text that has to be cleaned! its annoying! Python 2.x >>> s 'This is some \\u03c0 text that has to be cleaned\\u2026! it\\u0027s annoying!' >>> print(s.decode('unicode_escape').encode('ascii','ignore')) This is some text that has to be cleaned! it's annoying! Python 3.x >>> s = 'This is some \u03c0 text that has to be cleaned\u2026! it\u0027s annoying!' >>> s.encode('ascii', 'ignore') b"This is some text that has to be cleaned! it's

How to print Unicode character in Python?

三世轮回 提交于 2019-11-26 11:37:12
I want to make a dictionary where English words point to Russian and French translations. How do I print out unicode characters in Python? Also, how do you store unicode chars in a variable? Matt Ryall To include Unicode characters in your Python source code, you can use Unicode escape characters in the form \u0123 in your string, and prefix the string literal with 'u'. Here's an example running in the Python interactive console: >>> print u'\u0420\u043e\u0441\u0441\u0438\u044f' Россия Strings declared like this are Unicode-type variables, as described in the Python Unicode documentation . If

SyntaxError: Non-ASCII character '\\xa3' in file when function returns '£'

自古美人都是妖i 提交于 2019-11-26 09:53:05
Say I have a function: def NewFunction(): return '£' I want to print some stuff with a pound sign in front of it and it prints an error when I try to run this program, this error message is displayed: SyntaxError: Non-ASCII character '\xa3' in file 'blah' but no encoding declared; see http://www.python.org/peps/pep-0263.html for details Can anyone inform me how I can include a pound sign in my return function? I'm basically using it in a class and it's within the '__str__' part that the pound sign is included. I'd recommend reading that PEP the error gives you. The problem is that your code is