问题
Question
How can I solve the ConnectionError: Too many heartbeats missed
from Celery?
Example Error
[2013-02-11 15:15:38,513: ERROR/MainProcess] Error in timer: ConnectionError('Too many heartbeats missed', None, None, None, '')
Traceback (most recent call last):
File "/app/.heroku/python/lib/python2.7/site-packages/celery/utils/timer2.py", line 97, in apply_entry
entry()
File "/app/.heroku/python/lib/python2.7/site-packages/celery/utils/timer2.py", line 51, in __call__
return self.fun(*self.args, **self.kwargs)
File "/app/.heroku/python/lib/python2.7/site-packages/celery/utils/timer2.py", line 153, in _reschedules
return fun(*args, **kwargs)
File "/app/.heroku/python/lib/python2.7/site-packages/kombu/connection.py", line 265, in heartbeat_check
return self.transport.heartbeat_check(self.connection, rate=rate)
File "/app/.heroku/python/lib/python2.7/site-packages/kombu/transport/pyamqp.py", line 134, in heartbeat_check
return connection.heartbeat_tick(rate=rate)
File "/app/.heroku/python/lib/python2.7/site-packages/amqp/connection.py", line 837, in heartbeat_tick
raise ConnectionError('Too many heartbeats missed')
ConnectionError: Too many heartbeats missed
App Overview
- Django app using celery for periodic background tasks
- Hosted on Heroku
- Single task scheduled to run every 15 minutes via settings / celerybeat
- Messaging handled via CloudAMQP add-on
- Processes run by
web: newrelic-admin run-program gunicorn --workers=2 --worker-class=gevent someapp.wsgi:application
scheduler: newrelic-admin run-program python manage.py celery worker -B -E --maxtasksperchild=1000 --loglevel=WARNING
Package Versions
Just what I think are relevant:
Django==1.4.3
amqp==1.0.8
billiard==2.7.3.20
celery==3.0.14
gevent==0.13.8
greenlet==0.4.0
kombu==2.5.6
raven==3.1.10
What I've Tried So Far
- Correlating error with activities (doesn't seem to correlate with user's visiting app, background tasks being called, app idling)
- Googling / searching SO until my fingers were numb
- Upgrading packages to latest versions
- Various levels of logging
- Exception capturing with Sentry (doesn't appear in sentry)
- Cannot reproduce error locally under development environment, only in production on Heroku
Possible Relevant Info
- I'm not sure exactly when this error first appeared (~ one month ago?)
- May be related in some way to the following changes (don't recall error before this, not 100% sure though)
celery==3.0.13
tocelery==3.0.14
amqplib
->amqp
kombu==2.4.8
tokombu==2.5.4
- Error only appears in logs (doesn't get picked up by New Relic or getsentry.com)
回答1:
How often does it happen?
It may be that the heartbeat monitoring is not working properly in your case. The heartbeat support was introduced fairly recently, so there may be bugs. I cannot reproduce this here, so I need more data to understand what is going on.
You can disable heartbeats by setting BROKER_HEARTBEAT=0
.
If this is a bug then the worker should run fine, but it will not be able
to quickly detect a broken connection. Being unable to detect connection loss is only
a problem in some environments (usually caused by specific router/firewall configurations)
来源:https://stackoverflow.com/questions/14817181/django-celery-connectionerror-too-many-heartbeats-missed