Huey as a minimal task queue for Django
Are you considering adding a task queue to your Django project? Then this article should be useful to you.
When I was “younger” task queue with Django project meant celery task queue.
Now that I’m “older” there are simpler alternatives. The simplest I found was Huey. But the ideas presented here apply to evaluating all task queues for your Django project.
Frustrated with celery and django-celery
In December 2019 I was taking a Django project from Python 2 to 3. This project relied on celery and its integration for Django for asynchronous task processing. Github project link here. This codebase’s work was mostly done back in 2012-2015.
django-celery was the first problematic part. “Problematic” because it was not being actively maintained. Python 3 support was only “planned” at the time.
Besides, by testing celery with my Python3 setup I realised how “heavy” it is. “Heavy” in terms of dependencies.
I had used
celery for a very long time. A decade! But this nudged me into looking for alternatives.
My use cases for using an async task queue were:
- Handle tasks async. For example, sending an email out. Tasks that you should not handle within the request-response cycle.
- Scheduled tasks.
- Retrying of failed tasks. For example, reading from an API fails due to a network error. I want that task retried for a few times.
- Simple locking. More on what I mean by “simple” further down.
The above were (are) handled nicely by
celery. The scheduled tasks part relied entirely on
Another factor that pushed me “off the celery train” was something in my last long-term gig. The increasing reputation that celery is “heavyweight”.
Finally, celery provides a whole lot more than the above basic set of use cases I need.
Is celery heavyweight?
A colleague achieved significant gains in task execution time by moving off celery. To dramatiq. This was on a Python 3 project I didn’t work directly on. By significant I mean ~50% throughput. In this case, “tasks” were about handling a simple message that wrote one row to the database, at most. With no real processing of that message.
At about the same time the above happened, I was listening to the DjangoChat podcast.
The below is a transcript from the “Caching” episode from November 2019. Transcript here. The podcast folks were discussing caches. And brought up the usual suspects; Memcached and Redis. On mentioning Redis, Carlton Gibson calls out how easy it is to add a queue when you have Redis in place. And how much of an “overkill” celery is. Emphaisis in the below quote is mine:
have Redis? Yeah. You want to you want to use a queue. So let’s take a good queue package. So, you know, everyone always talks about celery, but celery is overkill for, you know, the majority of use cases. So what’s a good package? Well, there’s one called django-q, which I love and have fun with. That’s nice and simple. And that’s got a Redis back end. So you pip install, right or, you know, apt install Redis. And then you pip install django-q into your project, you know, a little bit of settings, magic, and you’re up and running [..]
I did not compare packages. To compare would mean installing each one, run the same task and measure. I had what was left of a “free” day to switch over from celery to another package and have things deployed by day’s end.
I considered these packages to see if I could adopt a more lightweight alternative to celery. Before I continue, by “lightweight” I mean, “lightweight in terms of”:
- package size and dependencies
- code that I would need to rework to transition from celery to this task queue
- works with Redis, a pre-existing component in my stack
The packages I
|Package||Link||How I got to know about it|
||Link||Ex-colleague’s experience, described above.|
||Link||Recommended during the DjangoChat podcast.|
||Link||Recommended by Adam Johnson, engineer and author I follow.|
I decided to move on with
Huey. It was not a clear-cut decision.
My mindset was not about installing the best. It was about installing a minimal package. That removes my dependency on celery/django-celery. And allows me to continue taking that project’s codebase to Python 3.
dramatiq? I did not go with
dramatiq because of two reasons:
- It required installation of another package. The “Advanced Python Scheduler”, APS, to allow scheduling of tasks.
django-dramatiq, while maintained by the author of
dramatiqitself, is “yet another package”.
After my experience with
django-celery any extra package scared me. I did not want to end up being unable to use a main library due to a smaller accompanying library not being maintained.
In comparison, for
Huey I would only need to
pip install huey. The Django integration part is part of the package. Docs here.
django-q? It is actively maintained. And targeted for Django. And offers a lot of features. But by the looks of it, it offered many features I was not going to be using. At the cost of being less lightweight than
The above does not mean
django-q are not great packages. Far from it. I have them in mind in case the use case changes.
Huey’s Django integration provides:
- Configuration of huey via the Django settings module.
- Running the consumer as a Django management command.
- Auto-discovery of
tasks.pymodules to simplify task importing.
- Properly manage database connections.
huey? The Setting Things Up section in Huey’s Django integration guide covers what you need to do from then on.
Instead of importing the
periodic_task decorators from the main Huey package, import from
from huey.contrib.djhuey import periodic_task, task
Huey also offers function decorators for tasks that execute queries, which automatically close the database connection:
from huey.contrib.djhuey import db_periodic_task, db_task
To set a task performing a
1 2 3 4 5
What about settings files? Main settings:
1 2 3 4 5
Seriously, that’s enough to have things running!
A note about how things run locally by default:
settings.DEBUG = True, tasks will be executed synchronously just like regular function calls. The purpose of this is to avoid running both Redis and an additional consumer process while developing or running tests.
This is probably the reason why
run_huey doesn’t autoreload.
To have Huey use redis to broker tasks locally:
redisservice is available locally (duh!)
HUEY["immediate_use_memory"] = False HUEY["immediate"] = False
More details on debug and synchronous execution with Huey here.
Note that, like in celery’s case, Huey uses
pickle to serialize messages (docs).
So pay attention to pass variable types that play well with
pickle when passing parameters.
Thus avoid passing a Django model instance or queryset as parameter, as shown below:
1 2 3 4 5 6 7 8
Instead pass the object
id, which is an
1 2 3 4 5 6 7 8
The serializer can be overridden. But going into that takes us beyond minimal for today.
This is an example of a periodic task. It calls a Django management command to clear expired user sessions every 2 hours:
1 2 3 4 5 6 7 8
One minimal aspect of Django+Huey is when it comes to deployment (tackled further down).
manage.py run_huey takes care of executing both:
- tasks that your code adds to the queue while it’s running, e.g. send an email triggered by a user action
- scheduled (aka “periodic”) tasks
Both the task
scheduler are run by the same
Example: I want to fetch API data on the first of the month at 2:30PM:
1 2 3 4 5 6 7 8 9
The above retries the function twice, with an interval of 10 seconds.
The one aspect that’s missing in this setup is a hook to retry only in case of specific exception. I want an exception caused by a network error to be retried. But not an exception due to a
ZeroDivisionError, for example.
huey provides signals that allow you to inspect an exception on various types of events. For example, you could use the below code to notify admins when a Huey task fails without being retried:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Sidenote 1: Keep in mind signals are executed synchronously by the consumer as it processes tasks.
Sidenote 2: this is just to demonstrate what can be done. A more standard way to do this is to attach an email handler at loglevel
ERROR to the huey consumer logger.
But I would prefer the hook that
dramatiq provides to determine whether a task should be retried in this style:
1 2 3 4 5 6 7
See? I’m veering off “minimal” and into better features. Let’s move on.
huey‘s author himself:
A simple lock ensures that one task cannot be executed in parallel.
Example use case: a report generation task that runs every 10 minutes, but occasionally it can take 15 minutes to complete. You want to ensure that it does not start a stampede. So you use a lock to ensure that only one instance of the task can run at a time.
Example code from
huey‘s docs on locking tasks:
1 2 3 4
Take a look around
A nice thing about Huey is its docs. Being a relatively small package, you don’t easily get lost.
The docs feel “cohesive”. Unlike how I used to feel with celery’s docs. So feel free to take a look around for other features not mentioned in this tutorial.
The troubleshooting and common pitfalls section saved me a headscratch or two.
In my Ubuntu setup I use supervisor for process management. For example supervisor manages the
gunicorn which binds the Django application to Nginx.
The bash scipt
To run Huey, supervisor runs a bash script that runs Huey:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Test the bash script above by running it.
It has to be executable, i.e.
chmod +x start_huey.bash if it’s not.
Output should be similar to this:
Starting mydjangoproject-huey as ubuntu [2020-07-01 16:26:54,455] INFO:huey.consumer:MainThread:Huey consumer started with 1 thread, PID 12113 at 2020-07-01 14:26:54.455715 [2020-07-01 16:26:54,456] INFO:huey.consumer:MainThread:Scheduler runs every 1 second(s). [2020-07-01 16:26:54,456] INFO:huey.consumer:MainThread:Periodic tasks are enabled. [2020-07-01 16:26:54,456] INFO:huey.consumer:MainThread:The following commands are available: + myapp.tasks.send_email [...]
supervisor conf file
File located at:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
A lot of the
conf file above is supervisor-specific.
The point of this section is to show you how simple it is to have
huey with Django run reliably on an Ubuntu instance.
What do you think about this? Can it be better? Can it be more minimal?