'ascii' codec can't encode character at position * ord not in range(128)

问题

There are a few threads on stackoverflow, but i couldn't find a valid solution to the problem as a whole.

I have collected huge sums of textual data from the urllib read function and stored the same in pickle files.

Now I want to write this data to a file. While writing i'm getting errors similar to -

'ascii' codec can't encode character u'\u2019' in position 16: ordinal not in range(128)

and a lot of data is being lost.

I suppose the data off the urllib read is byte data

I've tried

   1. text=text.decode('ascii','ignore')
   2. s=filter(lambda x: x in string.printable, s)
   3. text=u''+text
      text=text.decode().encode('utf-8')

but still im ending up with similar errors. Can somebody point out a proper solution. And also would codecs strip work. I have no issues if the conflict bytes are not written to the file as a string hence the loss is accepted.

回答1:

You can do it through smart_str of Django module. Just try this:

from django.utils.encoding import smart_str, smart_unicode

text = u'\u2019'
print smart_str(text)

You can install Django by starting a command shell with administrator privileges and run this command:

pip install Django

回答2:

Your data is unicode data. To write that to a file, use .encode():

text = text.encode('ascii', 'ignore')

but that would remove anything that isn't ASCII. Perhaps you wanted to encode to a more suitable encoding, like UTF-8, instead?

You may want to read up on Python and Unicode:

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky
The Python Unicode HOWTO
Pragmatic Unicode by Ned Batchelder

来源：https://stackoverflow.com/questions/15364266/ascii-codec-cant-encode-character-at-position-ord-not-in-range128

标签

python

unicode

decode

encode