Odd error with unicode for me. I was dealing with unicode fine, but when I ran it this morning one item u\'\\u201d\' gave error and gives me
UnicodeError: AS
You already have a unicode string, there is no need to decode it to a unicode string again.
What happens in that case is that python helpfully tries to first encode it for you, so that you can then decode it from utf-32
. It uses the default encoding to do so, which happens to be ASCII. Here is an explicit encode to show you the exception raised in that case:
>>> u'\u201d'.encode('ASCII')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201d' in position 0: ordinal not in range(128)
In short, when you have a unicode literal like u''
, there is no need to decode it.
Read up on the unicode, encodings, and default settings in the Python Unicode HOWTO. Another invaluable article on the subject is Joel Spolsky's Minimun Unicode knowledge post.