I have a python script that connects to the Twitter Firehose and sends data downstream for processing. Before it was working fine, but now I\'m trying to get only the text b
Since nobody's jumped in yet, here's my shot. Python sets stdout's encoding when writing to a console but not when writing to a file. This script reproduces the problem:
import sys
msg = {'text':u'\2026'}
sys.stderr.write('default encoding: %s\n' % sys.stdout.encoding)
print msg['text']
when running the above shows the error:
$ python bad.py>/tmp/xxx
default encoding: None
Traceback (most recent call last):
File "fix.py", line 5, in <module>
print msg['text']
UnicodeEncodeError: 'ascii' codec can't encode character u'\x82' in position 0: ordinal not in range(128)
Adding the encoding to the above script:
import sys
msg = {'text':u'\2026'}
sys.stderr.write('default encoding: %s\n' % sys.stdout.encoding)
encoding = sys.stdout.encoding or 'utf-8'
print msg['text'].encode(encoding)
and the problem is solved:
$ python good.py >/tmp/xxx
default encoding: None
$ cat /tmp/xxx
6