UTF-8 In Python logging, how?

前端 未结 6 436
春和景丽
春和景丽 2020-12-02 15:45

I\'m trying to log a UTF-8 encoded string to a file using Python\'s logging package. As a toy example:

import logging

def logging_test():
    handler = log         


        
相关标签:
6条回答
  • 2020-12-02 15:53

    If I understood your problem correctly, the same issue should arise on your system when you do just:

    str(u'ô')
    

    I guess automatic encoding to the locale encoding on Unix will not work until you have enabled locale-aware if branch in the setencoding function in your site module via locale. This file usually resides in /usr/lib/python2.x, it worth inspecting anyway. AFAIK, locale-aware setencoding is disabled by default (it's true for my Python 2.6 installation).

    The choices are:

    • Let the system figure out the right way to encode Unicode strings to bytes or do it in your code (some configuration in site-specific site.py is needed)
    • Encode Unicode strings in your code and output just bytes

    See also The Illusive setdefaultencoding by Ian Bicking and related links.

    0 讨论(0)
  • 2020-12-02 16:03

    I'm a little late, but I just came across this post that enabled me to set up logging in utf-8 very easily

    Here the link to the post

    or here the code:

    root_logger= logging.getLogger()
    root_logger.setLevel(logging.DEBUG) # or whatever
    handler = logging.FileHandler('test.log', 'w', 'utf-8') # or whatever
    formatter = logging.Formatter('%(name)s %(message)s') # or whatever
    handler.setFormatter(formatter) # Pass handler as a parameter, not assign
    root_logger.addHandler(handler)
    
    0 讨论(0)
  • 2020-12-02 16:04

    Try this:

    import logging
    
    def logging_test():
        log = open("./logfile.txt", "w")
        handler = logging.StreamHandler(log)
        formatter = logging.Formatter("%(message)s")
        handler.setFormatter(formatter)
        root_logger = logging.getLogger()
        root_logger.addHandler(handler)
        root_logger.setLevel(logging.INFO)
    
        # This is an o with a hat on it.
        byte_string = '\xc3\xb4'
        unicode_string = unicode("\xc3\xb4", "utf-8")
    
        print "printed unicode object: %s" % unicode_string
    
        # Explode
        root_logger.info(unicode_string.encode("utf8", "replace"))
    
    
    if __name__ == "__main__":
        logging_test()
    

    For what it's worth I was expecting to have to use codecs.open to open the file with utf-8 encoding but either that's the default or something else is going on here, since it works as is like this.

    0 讨论(0)
  • 2020-12-02 16:05

    Having code like:

    raise Exception(u'щ')
    

    Caused:

      File "/usr/lib/python2.7/logging/__init__.py", line 467, in format
        s = self._fmt % record.__dict__
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
    

    This happens because the format string is a byte string, while some of the format string arguments are unicode strings with non-ASCII characters:

    >>> "%(message)s" % {'message': Exception(u'\u0449')}
    *** UnicodeEncodeError: 'ascii' codec can't encode character u'\u0449' in position 0: ordinal not in range(128)
    

    Making the format string unicode fixes the issue:

    >>> u"%(message)s" % {'message': Exception(u'\u0449')}
    u'\u0449'
    

    So, in your logging configuration make all format string unicode:

    'formatters': {
        'simple': {
            'format': u'%(asctime)-s %(levelname)s [%(name)s]: %(message)s',
            'datefmt': '%Y-%m-%d %H:%M:%S',
        },
     ...
    

    And patch the default logging formatter to use unicode format string:

    logging._defaultFormatter = logging.Formatter(u"%(message)s")
    
    0 讨论(0)
  • 2020-12-02 16:06

    Check that you have the latest Python 2.6 - some Unicode bugs were found and fixed since 2.6 came out. For example, on my Ubuntu Jaunty system, I ran your script copied and pasted, removing only the '/home/ted/' prefix from the log file name. Result (copied and pasted from a terminal window):

    vinay@eta-jaunty:~/projects/scratch$ python --version
    Python 2.6.2
    vinay@eta-jaunty:~/projects/scratch$ python utest.py 
    printed unicode object: ô
    vinay@eta-jaunty:~/projects/scratch$ cat logfile.txt 
    ô
    vinay@eta-jaunty:~/projects/scratch$ 
    

    On a Windows box:

    C:\temp>python --version
    Python 2.6.2
    
    C:\temp>python utest.py
    printed unicode object: ô
    

    And the contents of the file:

    alt text

    This might also explain why Lennart Regebro couldn't reproduce it either.

    0 讨论(0)
  • 2020-12-02 16:17

    I had a similar problem running Django in Python3: My logger died upon encountering some Umlauts (äöüß) but was otherwise fine. I looked through a lot of results and found none working. I tried

    import locale; 
    if locale.getpreferredencoding().upper() != 'UTF-8': 
        locale.setlocale(locale.LC_ALL, 'en_US.UTF-8') 
    

    which I got from the comment above. It did not work. Looking at the current locale gave me some crazy ANSI thing, which turned out to mean basically just "ASCII". That sent me into totally the wrong direction.

    Changing the logging format-strings to Unicode would not help. Setting a magic encoding comment at the beginning of the script would not help. Setting the charset on the sender's message (the text came from a HTTP-reqeust) did not help.

    What DID work was setting the encoding on the file-handler to UTF-8 in settings.py. Because I had nothing set, the default would become None. Which apparently ends up being ASCII (or as I'd like to think about: ASS-KEY)

        'handlers': {
            'file': {
                'level': 'DEBUG',
                'class': 'logging.handlers.TimedRotatingFileHandler',
                'encoding': 'UTF-8', # <-- That was missing.
                ....
            },
        },
    
    0 讨论(0)
提交回复
热议问题