Django related objects are missing from celery task (race condition?)

隐身守侯 提交于 2019-12-02 11:22:35

问题


Strange behavior, that I don't know how to explain. I've got a model, Track, with some related points. I call a celery task to performs some calculations with points, and they seem to be perfectly reachable in the method itself, but unavailable in celery task.

@shared_task
def my_task(track):
    print 'in the task', track.id, track.points.all().count()

def some_method():
    t = Track()
    t.save()
    t = fill_with_points(t)  # creating points, attaching them to a Track
    t.save()
    print 'before the task', track.id, track.points.all().count()
    my_task.delay(t)

That prints the following:

before the task, 21346, 2971
in the task, 21346, 0

Strange thing though, when I put a time.sleep(10) at the first line of my_task or before calling my_task at all, it works out well, like there's some race condition. But the first printed line clearly says that points are available in the database, when it makes a select query (track.points.all().count()).


回答1:


I'm going to assume this is due to transaction isolation.

Django transactions by default are tied to requests; and while a transaction is active, no other process will see the changes until the transaction is committed. If you're in the middle of a save method, and there are quite a lot of other actions that take place before the request finishes, it seems likely that Celery starts processing the task before the transaction is committed. You could fix this by committing manually or by delaying the task.




回答2:


You should NEVER pass model objects to celery tasks. This is because the session might expire (or be different) in the celery task compared to your Django application and this object will not be linked to the session and thus may not be available/beheave badly. What you should do is send the id. So something like track_id and then get the object from the database by issuing a query. That should most likely solve your problem.

@shared_task
def my_task(track_id):
    track = Track.query.get(track_id)  # Or how ever the query should be
    print 'in the task', track.id, track.points.all().count()

def some_method():
    t = Track()
    t.save()
    t = fill_with_points(t)  # creating points, attaching them to a Track
    t.save()
    print 'before the task', track.id, track.points.all().count()
    my_task.delay(t.id)  # Pass the id here, not the object



回答3:


So, I've solved it using django-transaction-hooks. It still looks kinda scary to replace my DB backend, but django-celery-transactions seems to be broken in Django 1.6. Now my setup looks like this:

settings.py:

DATABASES = {
    'default': {
        'ENGINE': 'transaction_hooks.backends.postgresql_psycopg2',
        'NAME': 'foo',
        },
    }
SOUTH_DATABASE_ADAPTERS = {'default':'south.db.postgresql_psycopg2'}  # this is required, or South breaks

models.py:

from django.db import connection

@shared_task
def my_task(track):
    print 'in the task', track.id, track.points.all().count()

def some_method():
    t = Track()
    t.save()
    t = fill_with_points(t)  # creating points, attaching them to a Track
    t.save()
    print 'before the task', track.id, track.points.all().count()
    connection.on_commit(lambda: my_task.delay(t))

Results:

before the task, 21346, 2971
in the task, 21346, 2971

It still seems strange that such a common use case has no native celery or Django solution.



来源:https://stackoverflow.com/questions/26862942/django-related-objects-are-missing-from-celery-task-race-condition

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!