django-celery, eventlet and debugging blocking

I recently wrote a couple Celery tasks that are purely IO bound. So instead of using the default multiprocessing execution pool, I used the Eventlet execution pool. With just a small change in Celery settings, I was off to the races.

Wrong! After some amount of time, it just sits at 100% CPU and no longer processes tasks. Unfortunately, Celery calls monkey_patch a little late when coupled with Django. Django does some magic of its own to various components so monkey_patch needs to be called before Django does any initialization. After a little digging, I found I can just set an environment variable to prevent Celery from doing the monkey patching and at the same time use it to signal manage.py to call monkey_patch before the initialization my Django app.

Just add this to the top of your manage.py:

if os.environ.get('EVENTLET_NOPATCH'):
    import eventlet
    import eventlet.debug

    eventlet.monkey_patch()
    eventlet.debug.hub_prevent_multiple_readers(False)

    eventlet_timeout = os.environ.get('EVENTLET_TIMEOUT')
    if eventlet_timeout:
        eventlet.debug.hub_blocking_detection(True, float(eventlet_timeout))

Now when starting celeryd just add the environment variable EVENTLET_NOPATCH='yes' to your manage.py command

EVENTLET_NOPATCH='yes' python ./manage.py celeryd -c 1000 -Q eventlet_tasks -P eventlet

Also, if you’re worker seems to be running a little slow, you can now add EVENTLET_TIMEOUT='1.0' to cause Eventlet to print a stacktrace of the blocking thread to stderr. hub_blocking_detection takes a float in number of seconds to set the alarm for.

EVENTLET_TIMEOUT='0.1' EVENTLET_NOPATCH='yes' python ./manage.py celeryd -c 10 -Q eventlet_tasks -P eventlet

  • hafa

    Hey!

    Thanks for the hints – I am trying the same ATM.

    Just one question which is not clear to me: 
    In your tasks, do you use the python standard lib or the eventlib version of various functions ?
    (E.g. see https://github.com/ask/celery/blob/master/examples/eventlet/tasks.py – they use eventlet.green.urllib2 instead of the standard urllib)
    As far as I understand the monkey patch should automatically patch everything so that the eventlib version is used ?

    And for the requirements: which packages do you install? eventlet and dnspython?

    Thanks for any answer and have a nice weekend!

    • http://www.dctrwatson.com John Watson

      Glad they could be of use to you.

      I like to be explicit, so I use eventlib version of the functions. Though you are correct and the monkey patch does already patch everything if you still imported from the python stdlib.

      Yes, eventlet+dnspython are the two packages I use. One thing to watch out for is: https://bitbucket.org/which_linden/eventlet/issue/94/greendns-memory-leak

      I’ve run into similar issues as that bug report, even with EVENTLET_NO_GREENDNS=’yes’ (which disables the usage of dnspython)

      I haven’t had time to investigate it, so I’ve been waiting for it to be fixed and in the meantime just restart the process periodically.

  • Seppo Yli-Olli

    As far as I’ve read, it’s safe to use monkey_patch multiple times so do you even need the EVENTLET_NOPATCH hack at all there? (you might need for the hub commands, I haven’t managed to find out what they actually do :)

    • http://www.dctrwatson.com John Watson

      I’ve read both ways. So I figure better safe, than weird bugs cropping up.