Untangled Development

Home ·  Blog ·  About

Retrieving related data in Django

All applications are different. However some themes keep coming up.

One such theme is retrieving data from several tables. Tables related using one-to-many and many-to-many relationships.

Since this post is inspired by Django’s own docs, it builds on the Pizza and Topping example entities to demonstrate a typical many-to-many relationship. I.e. a Pizza has one-to-many Toppings. And a Topping can be on one-to-many Pizzas:

Initial Entity-Relationship Diagram
Initial Entity-Relationship Diagram

The code for this experiment

Is at this github repo. The repo’s commits show the progress made towards acheving the final code.

Note that this experiment was done in order to compare django model bakery and factory boy.

That post is still in progress.

Classifying a Pizza as (VG) or (V)

(VG) stands for vegan, (V) for vegetarian.

We want to determine an attribute of a related entity, based on a property of another entity related by a many-to-many relationshp. Interesting.

I.e. Let’s write the logic down:

  • A Pizza is vegan (VG) if all its Toppings are vegan.
  • A Pizza is vegetarian (V) if all its Toppings are vegan or vegetarian.

The approach to set a Topping as vegan or vegetarian is by using an IntegerField with choices:

class Topping(models.Model):
    VEGAN = 0
    VEGETARIAN = 1
    TOPPING_TYPE_CHOICES = (
        (VEGAN, 'Vegan-friendly'),
        (VEGETARIAN, 'Vegetarian-friendly'))
    name = models.CharField(max_length=64)
    rating = models.PositiveSmallIntegerField(
        blank=True, null=True, choices=TOPPING_TYPE_CHOICES)

    def __str__(self):
        return self.name

If a Topping is neither vegan nor vegetarian, then it’s rating is NULL.

To classify a Pizza as (VG) or (V) it takes only an additional query per Pizza. Independent of how many Toppings it contains. In Django you can achieve this with prefetch_related, docs.

class Pizza(models.Model):
    name = models.CharField(max_length=64)
    toppings = models.ManyToManyField(Topping)
    price = models.DecimalField(max_digits=4, decimal_places=2)

    def __str__(self):
        return self.name

    @property
    def is_vegan(self):
        return all([
            topping.rating == Topping.VEGAN for topping in self.toppings.all()
        ])

    @property
    def is_vegetarian(self):
        return all([
            topping.rating in (
                Topping.VEGAN, Topping.VEGETARIAN
            ) for topping in self.toppings.all()
        ])

You can refer to the commit with these changes here.

The above is a linear relationship. Which allows us to tell how many queries a page will generate. E.g. with 10 Pizzas per page, max queries to show (VG) or (V) next to a Pizza would be 20 database queries.

The other scenario, which we are trying to avoid, is one where the number of queries would be unpredictable. For example, one query per Pizza, and another per Topping.

We have avoided this. But business requirements evolve. And what might feel like an accomplishment today quickly turns into a new challenge tomorrow.

Let’s explore the new requirement.

Classifying a Customer as (VG) or (V)

Our product owner would like to determine who of our customers are vegans. Who are vegetarians. Etc. That way our pizza parlour can provide special offers to the target customers.

For simplicity’s sake a Customer might reperesent a whole group. So a Customer might be a family or group of people, making an order for the whole group. So in this case we’d be classifying the whole group as (VG) or (V).

To do this, I extended the Pizza >-< Topping many-to-many relationship with Order and Customer entities:

Extended Entity-Relationship Diagram
Extended Entity-Relationship Diagram

An Order has one-to-many Pizzas. And a Pizza can be one one-to-many Orders.

A Customer can make one-to-many Orders. But an Order can belong to only one Customer.

Code, from this commit:

class Order(models.Model):
    date = models.DateField()
    customer = models.ForeignKey('pizzas.Customer', on_delete=models.CASCADE)
    pizzas = models.ManyToManyField('pizzas.Pizza')

[..]

class Customer(models.Model):
    first_name = models.CharField(max_length=32)
    last_name = models.CharField(max_length=32)

So in this case, we have to characterise the Customer using a field stored on the Topping. Which is three relationships away. With:

  • a Customer having one-to-many Orders
  • Orders having many-to-many Pizzas, and
  • Pizzas having many-to-many Toppingss

it’s impossible to keep the amount of queries needed to tell whether a customer is (V), (VG) or neither linear.

With the current structure the number of queries will depend on how many Orders a Customer has made. And the amount of Pizzas in each.

We are being held hostage by the underlying data semantics. But before changing the data structure let’s understand the direction we’re working towards. Enter Django model managers.

Django model managers

I think Django model managers are an under appreciated construct in Django. And consequently under used.

Let’s say that, instead of querying all Pizzas to get an Order’s total, you want to retrieve the Order’s total within the same query. Every time Orders are retrieved as part of a queryset.

You can. By defining a manager. Code, from this commit:

class OrderManager(models.Manager):

    def get_queryset(self):
        return super().get_queryset().annotate(
            total=models.Sum('pizzas__price'))


class Order(models.Model):
    date = models.DateField()
    customer = models.ForeignKey('pizzas.Customer', on_delete=models.CASCADE)
    pizzas = models.ManyToManyField('pizzas.Pizza')

    objects = OrderManager()

The test, again code from same commit, that ensures we’re doing everything by querying the DB once:

class OrderManagerTest(TestCase):

    def test_get_queryset_with_total(self):
        load_data()
        order = baker.make('pizzas.Order')

        rossa = Pizza.objects.get(name='rossa')
        campagnola = Pizza.objects.get(name='campagnola')
        order.pizzas.add(rossa)
        order.pizzas.add(campagnola)

        with self.assertNumQueries(1):
            total = Order.objects.filter(id=order.id).first().total

        self.assertEqual(total, 16.0)

Sidenote: in case you’re unfamiliar, assertNumQueries docs here.

We want to be able to retrieve whether a Customer is a vegan, vegetarian or neither using the same approach.

Changing the underlying data

Wait. We’re not going to really change the data structure. We are going to change the semantics of the data stored though.

By adding a new choice variable, “non vegan non vegetarian”, we can query a group of values directly. Using aggregate functions we can tell whether a Pizza, an Order, or a Customer is vegan, vegetarian or neither without fetching all Toppings for that Pizza, Order, or Customer.

The revised Topping model:

class Topping(models.Model):
    VEGAN = 0
    VEGETARIAN = 1
    NON_VG_NON_V = 2
    TOPPING_RATING_CHOICES = (
        (VEGAN, 'Vegan-friendly'),
        (VEGETARIAN, 'Vegetarian-friendly'),
        (NON_VG_NON_V, 'Non-vegan non-veg'))
    name = models.CharField(max_length=64)
    rating = models.PositiveSmallIntegerField(choices=TOPPING_RATING_CHOICES)

The changes above are in this commit.

So now we can use this logic:

  • if MAX topping rating is vegan, then Pizza / Order / Customer is vegan (VG)
  • if MAX topping rating is vegetarian, then Pizza / Order / Customer is vegetarian (V)
  • if MAX topping rating above vegetarian, then Pizza / Order / Customer is neither VG nor V

The next commit implements this logic. Here’s the manager code:

class PizzaManager(models.Manager):

    def get_queryset(self):
        return super().get_queryset().annotate(
            max_rating=Max('toppings__rating'),
            is_vegan=Case(
                When(max_rating=Topping.VEGAN, then=Value(True)),
                default=Value(False),
                output_field=models.BooleanField(),
            ),
            is_vegetarian=Case(
                When(max_rating__lte=Topping.VEGETARIAN, then=Value(True)),
                default=Value(False),
                output_field=models.BooleanField(),
            )
        )

The previous is_vegan and is_vegetarian properties were deleted. The one downside is that these “fields” are now available only when the Pizza instance is retrieved via queryset. The “fields” are not available at instance level.

The unit tests for the properties were retained and these the fields computed by the manager above.

Taking (VG) and (V) ratings to the Customer entity

This commit adds Customer is_vegan and is_vegetarian properties. The code added onto the Customer class:

class Customer(models.Model):

    ...

    def pizzas_ordered(self):
        return Pizza.objects.filter(
            id__in=self.order_set.values_list('pizzas__id', flat=True)
        ).distinct()

    @property
    def is_vegan(self):
        return all([pizza.is_vegan for pizza in self.pizzas_ordered()])

    @property
    def is_vegetarian(self):
        return all([pizza.is_vegetarian for pizza in self.pizzas_ordered()])

This unit test (from same commit) tests Customer.pizzas_ordered function:

def test_pizzas_ordered(self):
    load_data()
    rossa = Pizza.objects.get(name='rossa')
    margherita = Pizza.objects.get(name='margherita')
    capricciosa = Pizza.objects.get(name='capricciosa')
    diablo = Pizza.objects.get(name='diablo')
    campagnola = Pizza.objects.get(name='campagnola')
    (
        customer_vegan, customer_veggy, customer_meat_eater
    ) = Customer.objects.all()
    with self.assertNumQueries(3):
        self.assertCountEqual(
            customer_vegan.pizzas_ordered(),
            [rossa, campagnola])
        self.assertCountEqual(
            customer_veggy.pizzas_ordered(),
            [rossa, margherita, campagnola])
        self.assertCountEqual(
            customer_meat_eater.pizzas_ordered(),
            [rossa, margherita, diablo, capricciosa])

Notice how fetching all pizzas results in 1 query per customer.

The next unit test (from same commit) tests Customer.is_vegan property:

def test_is_vegan(self):
    load_data()
    (
        customer_vegan, customer_veggy, customer_meat_eater
    ) = Customer.objects.all()
    with self.assertNumQueries(3):
        self.assertTrue(customer_vegan.is_vegan)
        self.assertFalse(customer_veggy.is_vegan)
        self.assertFalse(customer_meat_eater.is_vegan)

Again, fetching whether a customer is (VG), (V) or neither still results in 1 query per customer. (After the custome has been already fetched.) This is only possible using the PizzaManager manager level functions implemented previously.

The way test data is created in this demo repo project will be discussed in a future post.

Credits

For test data, I used some pizzas from the menu of a pizza place in Malta. The owners are Sicilian. Highly recommend that place!

Subscribe via Atom feed, Twitter, or email:

I won't spam you. I'm too lazy. And I hate spam. Maximum one email per month. Thanks!