Celery worker's log contains question marks (???) instead of correct unicode characters

问题

I'm using Celery 3.1.18 with Python 2.7.8 on CentOS 6.5.

In a Celery task module, I have the following code:

# someapp/tasks.py
from celery import shared_task
from celery.utils.log import get_task_logger

logger = get_task_logger(__name__)


@shared_task()
def foo():
    logger.info('Test output: %s', u"测试中")

I use the initd script here to run a Celery worker. Also I put the following settings in /etc/default/celeryd:

CELERYD_NODES="bar"

# %N will be replaced with the first part of the nodename.
CELERYD_LOG_FILE="/var/log/celery/%N.log"

# Workers should run as an unprivileged user.
#   You need to create this user manually (or you can choose
#   a user/group combination that already exists, e.g. nobody).
CELERYD_USER="nobody"
CELERYD_GROUP="nobody"

So my log file is located in /var/log/celery/bar.log.

However, once the task is executed by the worker, the above log file shows:

[2015-05-07 03:51:14,438: INFO/Worker-1/someapp.tasks.foo(...)] Test output: ???

The unicode characters are gone, replaced with a number of question marks.

How can I get back the unicode characters in the log file?

回答1:

You need to set the LANG=zh_CN.UTF-8 in the environment in which you startup your celery application.

If you are using the celeryd, there is a simple way, set CELERY_BIN="env LANG=zh_CN.UTF-8 /path/to/celery/binary in /etc/default/celeryd

Explanation:

Celery uses ColorFormatter for message formatting, which is defined in celery.utils.log.
ColorFormatter converts unicode to str with kombu.utils.encoding.safe_str.
kombu.utils.encoding.safe_str encodes unicode to str with encoding returns by default_encoding defined in kombu.utils.encoding
default_encoding returns getattr(get_default_encoding_file(), 'encoding', None) or sys.getfilesystemencoding()
Besides, I did not find celery set encoding explicitly, so I thought celery is use sys.getfilesystemencoding() as encoding for convert unicode to str.
sys.getfilesystemencoding's manual says that:

On Unix, the encoding is the user’s preference according to the result of nl_langinfo(CODESET), or None if the nl_langinfo(CODESET) failed
So, setting LANG=zh_CN.UTF8 in the celery process environment tells celery to convert unicode to str by UTF8.

来源：https://stackoverflow.com/questions/30094706/celery-workers-log-contains-question-marks-instead-of-correct-unicode-cha

标签

python

logging

unicode

celery