'./manage.py runserver' restarts when celery map/reduce tasks are running; sometimes raises error with inner_run

问题

I have a view in my django project that fires off a celery task. The celery task itself triggers a few map/reduce jobs via subprocess/fabric and the results of the hadoop job are stored on disk --- nothing is actually stored in the database. After the hadoop job has been completed, the celery task sends a django signal that it is done, something like this:

# tasks.py
from models import MyModel
import signals

from fabric.operations import local

from celery.task import Task

class Hadoopification(Task):
    def run(self, my_model_id, other_args):
        my_model = MyModel.objects.get(pk=my_model_id)
        self.hadoopify_function(my_model, other_args)
        signals.complete_signal.send(
            sender=self,
            my_model_id=my_model_id,
            complete=True,
        )

    def hadoopify_function(self, my_model, other_args):
        local("""hadoop jar /usr/lib/hadoop/hadoop-streaming.jar -D mapred.reduce.tasks=0 -file hadoopify.py -mapper "parse_mapper.py 0 0" -input /user/me/input.csv -output /user/me/output.csv""")

What is truly baffling me is that the django runserver is reloading when the celery task is run, as if I had changed some code somewhere in the django project (which I have not, I can assure you!). From time to time, this even causes errors in the runserver command where I see output like the following before the runserver command reloads and is ok again (note: this error message is very similar to the problem described here).

Unhandled exception in thread started by <function inner_run at 0xa18cd14>
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/apport_python_hook.py", line 48, in apport_excepthook
    if not enabled():
TypeError: 'NoneType' object is not callable

Original exception was:
Traceback (most recent call last):
  File "/home/rdm/Biz/Projects/Daegis/Server_Development/tar/env/lib/python2.6/site-packages/django/core/management/commands/runserver.py", line 60, in inner_run
    run(addr, int(port), handler)
  File "/home/rdm/Biz/Projects/Daegis/Server_Development/tar/env/lib/python2.6/site-packages/django/core/servers/basehttp.py", line 721, in run
    httpd.serve_forever()
  File "/usr/lib/python2.6/SocketServer.py", line 224, in serve_forever
    r, w, e = select.select([self], [], [], poll_interval)
AttributeError: 'NoneType' object has no attribute 'select'

I've narrowed the problem down to when calls are made to hadoop by replacing local("""hadoop ...""") with local("ls") which does not cause any problems with reloading the django runserver. There are no bugs in the hadoop code --- it runs just fine on its own when its not called by celery.

Any idea of what might be causing this?

回答1:

There is some discussion about this on the fabric github page here, here and here. Another option for raising an error is to use the settings context manager:

from fabric.api import settings

class Hadoopification(Task):
    ...
    def hadoopify_function(self, my_model, other_args):
        with settings(warn_only=True):
            result = local(...)
        if result.failed:
            # access result.return_code, result.stdout, result.stderr
            raise UsefulException(...)

This has the advantage of allowing access to the return code and all of the other attributes on the result.

回答2:

So after digging around in the fabric source code, I came to learn that django was reloading because my celery task, run within a fabric.operations.local command, was failing (which is hard to detect within the hadoop output puke-fest). When a fabric.operations.local command fails, fabric issues a sys.exit which caused celery to die and django to try and reload. This error can be detected by catching a SystemExit within the hadoop tasks like this:

class Hadoopification(Task):
    def run(self, my_model_id, other_args):
        my_model = MyModel.objects.get(pk=my_model_id)
        self.hadoopify_function(my_model, other_args)
        signals.complete_signal.send(
            sender=self,
            my_model_id=my_model_id,
            complete=True,
        )

    def hadoopify_function(self, my_model, other_args):
        try:
            local("""hadoop jar /usr/lib/hadoop/hadoop-streaming.jar -D mapred.reduce.tasks=0 -file hadoopify.py -mapper "parse_mapper.py 0 0" -input /user/me/input.csv -output /user/me/output.csv""")
        except SystemExit, e:
            # print some useful debugging information about exception e here!
            raise

回答3:

my guess is that there is some collision on the name of Task in both celery and fabric. I'd suggest using something more like:

import celery
class Hadoopification(celery.task.Task):
    ...

And try and avoid any further collisions, if that hunch is good.

But really, fabric's local is pretty nieve, and is essentially just a subprocess.Popen, which you could try calling raw to also separate out anything but python stdlib.

来源：https://stackoverflow.com/questions/10852961/manage-py-runserver-restarts-when-celery-map-reduce-tasks-are-running-somet

标签

MapReduce

celery

fabric

django-celery

hadoop-streaming