I\'m trying to log a UTF-8 encoded string to a file using Python\'s logging package. As a toy example:
import logging
def logging_test():
handler = log
If I understood your problem correctly, the same issue should arise on your system when you do just:
str(u'ô')
I guess automatic encoding to the locale encoding on Unix will not work until you have enabled locale-aware if
branch in the setencoding
function in your site module via locale. This file usually resides in /usr/lib/python2.x
, it worth inspecting anyway. AFAIK, locale-aware setencoding
is disabled by default (it's true for my Python 2.6 installation).
The choices are:
site.py
is needed)See also The Illusive setdefaultencoding by Ian Bicking and related links.
I'm a little late, but I just came across this post that enabled me to set up logging in utf-8 very easily
Here the link to the post
or here the code:
root_logger= logging.getLogger()
root_logger.setLevel(logging.DEBUG) # or whatever
handler = logging.FileHandler('test.log', 'w', 'utf-8') # or whatever
formatter = logging.Formatter('%(name)s %(message)s') # or whatever
handler.setFormatter(formatter) # Pass handler as a parameter, not assign
root_logger.addHandler(handler)
Try this:
import logging
def logging_test():
log = open("./logfile.txt", "w")
handler = logging.StreamHandler(log)
formatter = logging.Formatter("%(message)s")
handler.setFormatter(formatter)
root_logger = logging.getLogger()
root_logger.addHandler(handler)
root_logger.setLevel(logging.INFO)
# This is an o with a hat on it.
byte_string = '\xc3\xb4'
unicode_string = unicode("\xc3\xb4", "utf-8")
print "printed unicode object: %s" % unicode_string
# Explode
root_logger.info(unicode_string.encode("utf8", "replace"))
if __name__ == "__main__":
logging_test()
For what it's worth I was expecting to have to use codecs.open to open the file with utf-8 encoding but either that's the default or something else is going on here, since it works as is like this.
Having code like:
raise Exception(u'щ')
Caused:
File "/usr/lib/python2.7/logging/__init__.py", line 467, in format
s = self._fmt % record.__dict__
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
This happens because the format string is a byte string, while some of the format string arguments are unicode strings with non-ASCII characters:
>>> "%(message)s" % {'message': Exception(u'\u0449')}
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\u0449' in position 0: ordinal not in range(128)
Making the format string unicode fixes the issue:
>>> u"%(message)s" % {'message': Exception(u'\u0449')}
u'\u0449'
So, in your logging configuration make all format string unicode:
'formatters': {
'simple': {
'format': u'%(asctime)-s %(levelname)s [%(name)s]: %(message)s',
'datefmt': '%Y-%m-%d %H:%M:%S',
},
...
And patch the default logging
formatter to use unicode format string:
logging._defaultFormatter = logging.Formatter(u"%(message)s")
Check that you have the latest Python 2.6 - some Unicode bugs were found and fixed since 2.6 came out. For example, on my Ubuntu Jaunty system, I ran your script copied and pasted, removing only the '/home/ted/' prefix from the log file name. Result (copied and pasted from a terminal window):
vinay@eta-jaunty:~/projects/scratch$ python --version Python 2.6.2 vinay@eta-jaunty:~/projects/scratch$ python utest.py printed unicode object: ô vinay@eta-jaunty:~/projects/scratch$ cat logfile.txt ô vinay@eta-jaunty:~/projects/scratch$
On a Windows box:
C:\temp>python --version Python 2.6.2 C:\temp>python utest.py printed unicode object: ô
And the contents of the file:
This might also explain why Lennart Regebro couldn't reproduce it either.
I had a similar problem running Django in Python3: My logger died upon encountering some Umlauts (äöüß) but was otherwise fine. I looked through a lot of results and found none working. I tried
import locale;
if locale.getpreferredencoding().upper() != 'UTF-8':
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
which I got from the comment above. It did not work. Looking at the current locale gave me some crazy ANSI thing, which turned out to mean basically just "ASCII". That sent me into totally the wrong direction.
Changing the logging format-strings to Unicode would not help. Setting a magic encoding comment at the beginning of the script would not help. Setting the charset on the sender's message (the text came from a HTTP-reqeust) did not help.
What DID work was setting the encoding on the file-handler to UTF-8 in settings.py
. Because I had nothing set, the default would become None
. Which apparently ends up being ASCII (or as I'd like to think about: ASS-KEY)
'handlers': {
'file': {
'level': 'DEBUG',
'class': 'logging.handlers.TimedRotatingFileHandler',
'encoding': 'UTF-8', # <-- That was missing.
....
},
},