Retrieving related data in Django
All applications are different. However some themes keep coming up.
One such theme is retrieving data from several tables. Tables related using one-to-many and many-to-many relationships.
Since this post is inspired by Django’s own docs, it builds on the
Topping example entities to demonstrate a typical many-to-many relationship. I.e. a
Pizza has one-to-many
Toppings. And a
Topping can be on one-to-many
The code for this experiment
Is at this github repo. The repo’s commits show the progress made towards acheving the final code.
That post is still in progress.
Classifying a Pizza as (VG) or (V)
(VG) stands for vegan,
(V) for vegetarian.
We want to determine an attribute of a related entity, based on a property of another entity related by a many-to-many relationshp. Interesting.
I.e. Let’s write the logic down:
Pizzais vegan (VG) if all its
Toppings are vegan.
Pizzais vegetarian (V) if all its
Toppings are vegan or vegetarian.
The approach to set a
Topping as vegan or vegetarian is by using an
class Topping(models.Model): VEGAN = 0 VEGETARIAN = 1 TOPPING_TYPE_CHOICES = ( (VEGAN, "Vegan-friendly"), (VEGETARIAN, "Vegetarian-friendly"), ) name = models.CharField(max_length=64) rating = models.PositiveSmallIntegerField( blank=True, null=True, choices=TOPPING_TYPE_CHOICES ) def __str__(self): return self.name
Topping is neither vegan nor vegetarian, then it’s
To classify a
Pizza as (VG) or (V) it takes only an additional query per
Pizza. Independent of how many
Toppings it contains. In Django you can achieve this with
class Pizza(models.Model): name = models.CharField(max_length=64) toppings = models.ManyToManyField(Topping) price = models.DecimalField(max_digits=4, decimal_places=2) def __str__(self): return self.name @property def is_vegan(self): return all([topping.rating == Topping.VEGAN for topping in self.toppings.all()]) @property def is_vegetarian(self): return all( [ topping.rating in (Topping.VEGAN, Topping.VEGETARIAN) for topping in self.toppings.all() ] )
You can refer to the commit with these changes here.
The above is a linear relationship. Which allows us to tell how many queries a page will generate. E.g. with 10
Pizzas per page, max queries to show (VG) or (V) next to a
Pizza would be 20 database queries.
The other scenario, which we are trying to avoid, is one where the number of queries would be unpredictable. For example, one query per
Pizza, and another per
We have avoided this. But business requirements evolve. And what might feel like an accomplishment today quickly turns into a new challenge tomorrow.
Let’s explore the new requirement.
Classifying a Customer as (VG) or (V)
Our product owner would like to determine who of our customers are vegans. Who are vegetarians. Etc. That way our pizza parlour can provide special offers to the target customers.
For simplicity’s sake a
Customer might reperesent a whole group. So a
Customer might be a family or group of people, making an order for the whole group. So in this case we’d be classifying the whole group as (VG) or (V).
To do this, I extended the
Pizza >-< Topping many-to-many relationship with
Order has one-to-many
Pizzas. And a
Pizza can be in one-to-many
Customer can make one-to-many
Orders. But an
Order can belong to only one
Code, from this commit:
class Order(models.Model): date = models.DateField() customer = models.ForeignKey("pizzas.Customer", on_delete=models.CASCADE) pizzas = models.ManyToManyField("pizzas.Pizza") ... class Customer(models.Model): first_name = models.CharField(max_length=32) last_name = models.CharField(max_length=32)
So in this case, we have to characterise the
Customer using a field stored on the
Topping. Which is three relationships away. With:
Orders having many-to-many
Pizzas having many-to-many
To tell whether a customer is (V), (VG) or neither, it’s impossible to keep the amount of queries needed linear.
With the current structure the number of queries will depend on how many
Customer has made. And the amount of
Pizzas in each.
We are being held hostage by the underlying data semantics. But before changing the data structure let’s understand the direction we’re working towards. Enter Django model managers.
Django model managers
I think Django model managers are an under appreciated construct in Django. And consequently under used.
Let’s say that, instead of querying all
Pizzas to get an
Order‘s total, you want to retrieve the
Order‘s total within the same query. Every time
Orders are retrieved as part of a queryset.
You can. By defining a manager. Code, from this commit:
class OrderManager(models.Manager): def get_queryset(self): return super().get_queryset().annotate(total=models.Sum("pizzas__price")) class Order(models.Model): date = models.DateField() customer = models.ForeignKey("pizzas.Customer", on_delete=models.CASCADE) pizzas = models.ManyToManyField("pizzas.Pizza") objects = OrderManager()
The test, again code from same commit, that ensures we’re doing everything by querying the DB once:
class OrderManagerTest(TestCase): def test_get_queryset_with_total(self): load_data() order = baker.make("pizzas.Order") rossa = Pizza.objects.get(name="rossa") campagnola = Pizza.objects.get(name="campagnola") order.pizzas.add(rossa) order.pizzas.add(campagnola) with self.assertNumQueries(1): total = Order.objects.filter(id=order.id).first().total self.assertEqual(total, 16.0)
Sidenote: in case you’re unfamiliar,
assertNumQueries docs here.
We want to be able to retrieve whether a
Customer is a vegan, vegetarian or neither using the same approach.
Changing the underlying data
Wait. We’re not going to really change the data structure. We are going to change the semantics of the data stored though.
By adding a new choice variable, “non vegan non vegetarian”, we can query a group of values directly. Using aggregate functions we can tell whether a
Order, or a
Customer is vegan, vegetarian or neither without fetching all
Toppings for that
class Topping(models.Model): VEGAN = 0 VEGETARIAN = 1 NON_VG_NON_V = 2 TOPPING_RATING_CHOICES = ( (VEGAN, "Vegan-friendly"), (VEGETARIAN, "Vegetarian-friendly"), (NON_VG_NON_V, "Non-vegan non-veg"), ) name = models.CharField(max_length=64) rating = models.PositiveSmallIntegerField(choices=TOPPING_RATING_CHOICES)
The changes above are in this commit.
So now we can use this logic:
MAXtopping rating is vegan, then
Customeris vegan (VG)
MAXtopping rating is vegetarian, then
Customeris vegetarian (V)
MAXtopping rating above vegetarian, then
Customeris neither VG nor V
The next commit implements this logic. Here’s the manager code:
class PizzaManager(models.Manager): def get_queryset(self): return ( super() .get_queryset() .annotate( max_rating=Max("toppings__rating"), is_vegan=Case( When(max_rating=Topping.VEGAN, then=Value(True)), default=Value(False), output_field=models.BooleanField(), ), is_vegetarian=Case( When(max_rating__lte=Topping.VEGETARIAN, then=Value(True)), default=Value(False), output_field=models.BooleanField(), ), ) )
is_vegetarian properties were deleted. The one downside is that these “fields” are now available only when the
Pizza instance is retrieved via queryset. The “fields” are not available at instance level.
The unit tests for the properties were retained and these the fields computed by the manager above.
Taking (VG) and (V) ratings to the Customer entity
This commit adds Customer
is_vegetarian properties. The code added onto the
class Customer(models.Model): ... def pizzas_ordered(self): return Pizza.objects.filter( id__in=self.order_set.values_list("pizzas__id", flat=True) ).distinct() @property def is_vegan(self): return all([pizza.is_vegan for pizza in self.pizzas_ordered()]) @property def is_vegetarian(self): return all([pizza.is_vegetarian for pizza in self.pizzas_ordered()])
This unit test (from same commit) tests
def test_pizzas_ordered(self): load_data() rossa = Pizza.objects.get(name="rossa") margherita = Pizza.objects.get(name="margherita") capricciosa = Pizza.objects.get(name="capricciosa") diablo = Pizza.objects.get(name="diablo") campagnola = Pizza.objects.get(name="campagnola") (customer_vegan, customer_veggy, customer_meat_eater) = Customer.objects.all() with self.assertNumQueries(3): self.assertCountEqual(customer_vegan.pizzas_ordered(), [rossa, campagnola]) self.assertCountEqual( customer_veggy.pizzas_ordered(), [rossa, margherita, campagnola] ) self.assertCountEqual( customer_meat_eater.pizzas_ordered(), [rossa, margherita, diablo, capricciosa], )
Notice how fetching all pizzas results in 1 query per customer.
The next unit test (from same commit) tests
def test_is_vegan(self): load_data() (customer_vegan, customer_veggy, customer_meat_eater) = Customer.objects.all() with self.assertNumQueries(3): self.assertTrue(customer_vegan.is_vegan) self.assertFalse(customer_veggy.is_vegan) self.assertFalse(customer_meat_eater.is_vegan)
Again, fetching whether a customer is (VG), (V) or neither still results in 1 query per customer. (After the customer has been already fetched.) This is only possible using the
PizzaManager manager level functions implemented previously.
The way test data is created in this demo repo project will be discussed in a future post.
For test data, I used some pizzas from the menu of a pizza place in Malta. The owners are Sicilian. Highly recommend that place!