Untangled Development

Home ·  Blog ·  About

Huey as a minimal task queue for Django

Are you considering adding a task queue to your Django project? Then this article should be useful to you.

When I was “younger” task queue with Django project meant celery task queue.

Now that I’m “older” there are simpler alternatives. The simplest I found was Huey. But the ideas presented here apply to evaluating all task queues for your Django project.

Background

Frustrated with celery and django-celery

In December 2019 I was taking a Django project from Python 2 to 3. This project relied on celery and its integration for Django for asynchronous task processing. Github project link here. This codebase’s work was mostly done back in 2012-2015.

django-celery was the first problematic part. “Problematic” because it was not being actively maintained. Python 3 support was only “planned” at the time.

Besides, by testing celery with my Python3 setup I realised how “heavy” it is. “Heavy” in terms of dependencies. billiard, kombu, etc.

I had used celery for a very long time. A decade! But this nudged me into looking for alternatives.

My use cases for using an async task queue were:

  • Handle tasks async. For example, sending an email out. Tasks that you should not handle within the request-response cycle.
  • Scheduled tasks.
  • Retrying of failed tasks. For example, reading from an API fails due to a network error. I want that task retried for a few times.
  • Simple locking. More on what I mean by “simple” further down.

The above were (are) handled nicely by celery. The scheduled tasks part relied entirely on django-celery.

Another factor that pushed me “off the celery train” was something in my last long-term gig. The increasing reputation that celery is “heavyweight”.

Finally, celery provides a whole lot more than the above basic set of use cases I need.

Is celery heavyweight?

A colleague achieved significant gains in task execution time by moving off celery. To dramatiq. This was on a Python 3 project I didn’t work directly on. By significant I mean ~50% throughput. In this case, “tasks” were about handling a simple message that wrote one row to the database, at most. With no real processing of that message.

At about the same time the above happened, I was listening to the DjangoChat podcast.

The below is a transcript from the “Caching” episode from November 2019. Transcript here. The podcast folks were discussing caches. And brought up the usual suspects; Memcached and Redis. On mentioning Redis, Carlton Gibson calls out how easy it is to add a queue when you have Redis in place. And how much of an “overkill” celery is. Emphaisis in the below quote is mine:

have Redis? Yeah. You want to you want to use a queue. So let’s take a good queue package. So, you know, everyone always talks about celery, but celery is overkill for, you know, the majority of use cases. So what’s a good package? Well, there’s one called django-q, which I love and have fun with. That’s nice and simple. And that’s got a Redis back end. So you pip install, right or, you know, apt install Redis. And then you pip install django-q into your project, you know, a little bit of settings, magic, and you’re up and running [..]

Packages Considered

Image source: British Library; An illustrated and descriptive guide to the great railways of England, and their connections with the continent.
Image source: British Library; An illustrated and descriptive guide to the great railways of England, and their connections with the continent.

I did not compare packages. To compare would mean installing each one, run the same task and measure. I had what was left of a “free” day to switch over from celery to another package and have things deployed by day’s end.

I considered these packages to see if I could adopt a more lightweight alternative to celery. Before I continue, by “lightweight” I mean, “lightweight in terms of”:

  • package size and dependencies
  • code that I would need to rework to transition from celery to this task queue
  • works with Redis, a pre-existing component in my stack

The packages I compared considered:

Package Link How I got to know about it
dramatiq Link Ex-colleague’s experience, described above.
django-q Link Recommended during the DjangoChat podcast.
huey Link Recommended by Adam Johnson, Django blogger I follow.

I decided to move on with Huey. It was not a clear-cut decision.

My mindset was not about installing the best. It was about installing a minimal package. That removes my dependency on celery/django-celery. And allows me to continue taking that project’s codebase to Python 3.

Why not dramatiq? I did not go with dramatiq because of two reasons:

  • It required installation of another package. The “Advanced Python Scheduler”, APS, to allow scheduling of tasks.
  • django-dramatiq, while maintained by the author of dramatiq itself, is “yet another package”.

After my experience with django-celery any extra package scared me. I did not want to end up being unable to use a main library due to a smaller accompanying library not being maintained.

In comparison, for Huey I would only need to pip install huey. The Django integration part is part of the package. Docs here.

In addition, dramatiq is intended for rabbitmq. I see no need for advanced messaging such as that enabled by rabbitmq right now.1

Why not django-q? It is actively maintained. And targeted for Django. And offers a lot of features. But by the looks of it, it offered many features I was not going to be using. At the cost of being less lightweight than Huey.

The above does not mean dramatiq or django-q are not great packages. Far from it. I have them in mind in case the use case changes.

Huey

Huey’s Django integration provides:

  1. Configuration of huey via the Django settings module.
  2. Running the consumer as a Django management command.
  3. Auto-discovery of tasks.py modules to simplify task importing.
  4. Properly manage database connections.

Sweet.

Code changes

Installed huey? The Setting Things Up section in Huey’s Django integration guide covers what you need to do from then on.

Instead of importing the task or periodic_task decorators from the main Huey package, import from huey.contrib.djhuey:

from huey.contrib.djhuey import periodic_task, task

Huey also offers function decorators for tasks that execute queries, which automatically close the database connection:

from huey.contrib.djhuey import db_periodic_task, db_task

To set a task performing a db_task therefore:

1
2
3
4
5
from huey.contrib import djhuey as huey

@huey.db_task()
def save_data_and_send_email():
    ...

What about settings files? Main settings:

1
2
3
4
5
# HUEY
HUEY = {
    'name': 'mydjangoproject',
    'url': 'redis://localhost:6379/?db=1',
}

Seriously, that’s enough to have things running!

A note about how things run locally by default:

When settings.DEBUG = True, tasks will be executed synchronously just like regular function calls. The purpose of this is to avoid running both Redis and an additional consumer process while developing or running tests.

This is probably the reason why run_huey doesn’t autoreload.

To have Huey use redis to broker tasks locally:

  1. ensure redis service is available locally (duh!)
  2. set immediate and immediate_use_memory to False:
HUEY['immediate_use_memory'] = False
HUEY['immediate'] = False

More details on debug and synchronous execution with Huey here.

Passing Parameters

Note that, like in celery’s case, Huey uses pickle to serialize messages (docs).

So pay attention to pass variable types that play well with pickle when passing parameters.

Thus avoid passing a Django model instance or queryset as parameter, as shown below:

1
2
3
4
5
6
7
8
from huey.contrib import djhuey as huey

@huey.task()
def notify_user(user):
    send_email(user.email, ...)

my_user = User.objects.get(id=user_id)
send_email(my_user)()  # enqueue task

Instead pass the object id, which is an int:

1
2
3
4
5
6
7
8
from huey.contrib import djhuey as huey

@huey.task()
def notify_user(user_id):
    user = User.objects.get(id=user_id)
    send_email(user.email, ...)

send_email(my_user.id)()  # enqueue task

The serializer can be overridden. But going into that takes us beyond minimal for today.

Periodic tasks

This is an example of a periodic task. It calls a Django management command to clear expired user sessions every 2 hours:

1
2
3
4
5
6
7
from huey import crontab
from huey.contrib import djhuey as huey

@huey.periodic_task(crontab(hour='*/2'))
def clear_expired_sessions():
    from django.core.management import call_command
    return call_command('clearsessions')

One minimal aspect of Django+Huey is when it comes to deployment (tackled further down).

manage.py run_huey takes care of executing both:

  • tasks that your code adds to the queue while it’s running, e.g. send an email triggered by a user action
  • scheduled (aka “periodic”) tasks

Both the task consumer and scheduler are run by the same run_huey process.

Retries

Example: I want to fetch API data on the first of the month at 2:30PM:

1
2
3
4
5
6
7
8
9
from huey import crontab
from huey.contrib import djhuey as huey

@huey.db_periodic_task(
    crontab(day='1', hour='14', minute='30'),
    retries=2, retry_delay=10)
@huey.lock_task('sync_gsuite_data')
def fetch_api_data():
    ...

The above retries the function twice, with an interval of 10 seconds.

The one aspect that’s missing in this setup is a hook to retry only in case of specific exception. I want an exception caused by a network error to be retried. But not an exception due to a ZeroDivisionError, for example.

huey provides signals that allow you to inspect an exception on various types of events. For example, you could use the below code to notify admins when a Huey task fails without being retried:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import traceback

from django.core.mail import mail_admins

from huey import signals
from huey.contrib import djhuey as huey


@huey.signal(signals.SIGNAL_ERROR)
def task_error(signal, task, exc):
    if task.retries > 0:
        return  # do not notify when task is to be retried
    subject = f'Task [{task.name}] failed'
    message = f"""Task ID: {task.id}
Args: {task.args}
Kwargs: {task.kwargs}
Exception: {exc}
{traceback.format_exc()}"""
    mail_admins(subject, message)

Sidenote 1: Keep in mind signals are executed synchronously by the consumer as it processes tasks.

Sidenote 2: this is just to demonstrate what can be done. A more standard way to do this is to attach an email handler at loglevel ERROR to the huey consumer logger.

But I would prefer the hook that dramatiq provides to determine whether a task should be retried in this style:

1
2
3
4
5
6
7
def should_retry(retries_so_far, exception):
    return retries_so_far < 3 and isinstance(exception, HttpTimeout)


@dramatiq.actor(retry_when=should_retry)
def count_words(url):
    ...

See? I’m veering off “minimal” and into better features. Let’s move on.

Simple locking

To quote huey’s author himself:

A simple lock ensures that one task cannot be executed in parallel.

Example use case: a report generation task that runs every 10 minutes, but occasionally it can take 15 minutes to complete. You want to ensure that it does not start a stampede. So you use a lock to ensure that only one instance of the task can run at a time.

Example code from huey’s docs on locking tasks:

1
2
3
4
@huey.periodic_task(crontab(minute='*/10'))
@huey.lock_task('reports-lock')  # Goes *after* the task decorator.
def generate_report():
    run_report()

Take a look around

A nice thing about Huey is its docs. Being a relatively small package, you don’t easily get lost.

The docs feel “cohesive”. Unlike how I used to feel with celery’s docs. So feel free to take a look around for other features not mentioned in this tutorial.

The troubleshooting and common pitfalls section saved me a headscratch or two.

Deployment

In my Ubuntu setup I use supervisor for process management. For example supervisor manages the gunicorn which binds the Django application to Nginx.

The bash scipt

To run Huey, supervisor runs a bash script that runs Huey:

start_huey.bash

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#!/bin/bash

NAME="mydjangoproject-huey"  # Name of the application
DJANGODIR=/home/ubuntu/webapp/mydjangoproject/proj  # Django project directory
DJANGOENVDIR=/home/ubuntu/webapp/mydjangoproject_env  # Django project virtualenv

echo "Starting $NAME as `whoami`"

# Activate the virtual environment
cd $DJANGODIR
source /home/ubuntu/webapp/mydjangoproject_env/bin/activate
source /home/ubuntu/webapp/mydjangoproject/proj/.env
export PYTHONPATH=$DJANGODIR:$PYTHONPATH

# Start Huey
exec ${DJANGOENVDIR}/bin/python manage.py run_huey

Test the bash script above by running it.

It has to be executable, i.e. chmod +x start_huey.bash if it’s not.

Output should be similar to this:

Starting mydjangoproject-huey as ubuntu
[2020-07-01 16:26:54,455] INFO:huey.consumer:MainThread:Huey consumer started with 1 thread, PID 12113 at 2020-07-01 14:26:54.455715
[2020-07-01 16:26:54,456] INFO:huey.consumer:MainThread:Scheduler runs every 1 second(s).
[2020-07-01 16:26:54,456] INFO:huey.consumer:MainThread:Periodic tasks are enabled.
[2020-07-01 16:26:54,456] INFO:huey.consumer:MainThread:The following commands are available:
+ myapp.tasks.send_email
[...]

supervisor conf file

File located at: /etc/supervisor/conf.d/huey.conf

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
; ================================
;  huey supervisor
; ================================

[program:huey]
command = /home/ubuntu/webapp/start_huey.bash  ; Command to start huey

user=ubuntu
numprocs=1
stdout_logfile=/home/ubuntu/webapp/logs/huey/worker.log
stderr_logfile=/home/ubuntu/webapp/logs/huey/error.log
stdout_logfile_maxbytes=50MB
stderr_logfile_maxbytes=50MB
stdout_logfile_backups=10
stderr_logfile_backups=10 
autostart=true
autorestart=true
startsecs=10

; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 2

; Causes supervisor to send the termination signal (SIGTERM) to the whole process group.
stopasgroup=true

A lot of the conf file above is supervisor-specific.

The point of this section is to show you how simple it is to have huey with Django run reliably on an Ubuntu instance.

Conclusion

What do you think about this? Can it be better? Can it be more minimal?

Footnotes

  1. If you want to know more on dramatiq, its author was interviewed in December 2017 on the Python podcast. Great episode!