Retrieving related data in Django
All applications are different. However some themes keep coming up.
One such theme is retrieving data from several tables. Tables related using one-to-many and many-to-many relationships.
Since this post is inspired by Django’s own docs, it builds on the Pizza
and Topping
example entities to demonstrate a typical many-to-many relationship. I.e. a Pizza
has one-to-many Topping
s. And a Topping
can be on one-to-many Pizza
s:
The code for this experiment
Is at this github repo. The repo’s commits show the progress made towards acheving the final code.
Note that this experiment was done in order to compare django model bakery and factory boy.
That post is still in progress.
Classifying a Pizza as (VG) or (V)
(VG)
stands for vegan, (V)
for vegetarian.
We want to determine an attribute of a related entity, based on a property of another entity related by a many-to-many relationshp. Interesting.
I.e. Let’s write the logic down:
- A
Pizza
is vegan (VG) if all itsTopping
s are vegan. - A
Pizza
is vegetarian (V) if all itsTopping
s are vegan or vegetarian.
The approach to set a Topping
as vegan or vegetarian is by using an IntegerField
with choices
:
class Topping(models.Model):
VEGAN = 0
VEGETARIAN = 1
TOPPING_TYPE_CHOICES = (
(VEGAN, "Vegan-friendly"),
(VEGETARIAN, "Vegetarian-friendly"),
)
name = models.CharField(max_length=64)
rating = models.PositiveSmallIntegerField(
blank=True, null=True, choices=TOPPING_TYPE_CHOICES
)
def __str__(self):
return self.name
If a Topping
is neither vegan nor vegetarian, then it’s rating
is NULL
.
To classify a Pizza
as (VG) or (V) it takes only an additional query per Pizza
. Independent of how many Topping
s it contains. In Django you can achieve this with prefetch_related
, docs.
class Pizza(models.Model):
name = models.CharField(max_length=64)
toppings = models.ManyToManyField(Topping)
price = models.DecimalField(max_digits=4, decimal_places=2)
def __str__(self):
return self.name
@property
def is_vegan(self):
return all([topping.rating == Topping.VEGAN for topping in self.toppings.all()])
@property
def is_vegetarian(self):
return all(
[
topping.rating in (Topping.VEGAN, Topping.VEGETARIAN)
for topping in self.toppings.all()
]
)
You can refer to the commit with these changes here.
The above is a linear relationship. Which allows us to tell how many queries a page will generate. E.g. with 10 Pizza
s per page, max queries to show (VG) or (V) next to a Pizza
would be 20 database queries.
The other scenario, which we are trying to avoid, is one where the number of queries would be unpredictable. For example, one query per Pizza
, and another per Topping
.
We have avoided this. But business requirements evolve. And what might feel like an accomplishment today quickly turns into a new challenge tomorrow.
Let’s explore the new requirement.
Classifying a Customer as (VG) or (V)
Our product owner would like to determine who of our customers are vegans. Who are vegetarians. Etc. That way our pizza parlour can provide special offers to the target customers.
For simplicity’s sake a Customer
might reperesent a whole group. So a Customer
might be a family or group of people, making an order for the whole group. So in this case we’d be classifying the whole group as (VG) or (V).
To do this, I extended the Pizza >-< Topping
many-to-many relationship with Order
and Customer
entities:
An Order
has one-to-many Pizza
s. And a Pizza
can be in one-to-many Order
s.
A Customer
can make one-to-many Order
s. But an Order
can belong to only one Customer
.
Code, from this commit:
class Order(models.Model):
date = models.DateField()
customer = models.ForeignKey("pizzas.Customer", on_delete=models.CASCADE)
pizzas = models.ManyToManyField("pizzas.Pizza")
...
class Customer(models.Model):
first_name = models.CharField(max_length=32)
last_name = models.CharField(max_length=32)
So in this case, we have to characterise the Customer
using a field stored on the Topping
. Which is three relationships away. With:
- a
Customer
having one-to-manyOrder
s Order
s having many-to-manyPizza
s, andPizza
s having many-to-manyToppings
s
To tell whether a customer is (V), (VG) or neither, it’s impossible to keep the amount of queries needed linear.
With the current structure the number of queries will depend on how many Order
s a Customer
has made. And the amount of Pizza
s in each.
We are being held hostage by the underlying data semantics. But before changing the data structure let’s understand the direction we’re working towards. Enter Django model managers.
Django model managers
I think Django model managers are an under appreciated construct in Django. And consequently under used.
Let’s say that, instead of querying all Pizza
s to get an Order
‘s total, you want to retrieve the Order
‘s total within the same query. Every time Order
s are retrieved as part of a queryset.
You can. By defining a manager. Code, from this commit:
class OrderManager(models.Manager):
def get_queryset(self):
return super().get_queryset().annotate(total=models.Sum("pizzas__price"))
class Order(models.Model):
date = models.DateField()
customer = models.ForeignKey("pizzas.Customer", on_delete=models.CASCADE)
pizzas = models.ManyToManyField("pizzas.Pizza")
objects = OrderManager()
The test, again code from same commit, that ensures we’re doing everything by querying the DB once:
class OrderManagerTest(TestCase):
def test_get_queryset_with_total(self):
load_data()
order = baker.make("pizzas.Order")
rossa = Pizza.objects.get(name="rossa")
campagnola = Pizza.objects.get(name="campagnola")
order.pizzas.add(rossa)
order.pizzas.add(campagnola)
with self.assertNumQueries(1):
total = Order.objects.filter(id=order.id).first().total
self.assertEqual(total, 16.0)
Sidenote: in case you’re unfamiliar, assertNumQueries
docs here.
We want to be able to retrieve whether a Customer
is a vegan, vegetarian or neither using the same approach.
Changing the underlying data
Wait. We’re not going to really change the data structure. We are going to change the semantics of the data stored though.
By adding a new choice variable, “non vegan non vegetarian”, we can query a group of values directly. Using aggregate functions we can tell whether a Pizza
, an Order
, or a Customer
is vegan, vegetarian or neither without fetching all Topping
s for that Pizza
, Order
, or Customer
.
The revised Topping
model:
class Topping(models.Model):
VEGAN = 0
VEGETARIAN = 1
NON_VG_NON_V = 2
TOPPING_RATING_CHOICES = (
(VEGAN, "Vegan-friendly"),
(VEGETARIAN, "Vegetarian-friendly"),
(NON_VG_NON_V, "Non-vegan non-veg"),
)
name = models.CharField(max_length=64)
rating = models.PositiveSmallIntegerField(choices=TOPPING_RATING_CHOICES)
The changes above are in this commit.
So now we can use this logic:
- if
MAX
topping rating is vegan, thenPizza
/Order
/Customer
is vegan (VG) - if
MAX
topping rating is vegetarian, thenPizza
/Order
/Customer
is vegetarian (V) - if
MAX
topping rating above vegetarian, thenPizza
/Order
/Customer
is neither VG nor V
The next commit implements this logic. Here’s the manager code:
class PizzaManager(models.Manager):
def get_queryset(self):
return (
super()
.get_queryset()
.annotate(
max_rating=Max("toppings__rating"),
is_vegan=Case(
When(max_rating=Topping.VEGAN, then=Value(True)),
default=Value(False),
output_field=models.BooleanField(),
),
is_vegetarian=Case(
When(max_rating__lte=Topping.VEGETARIAN, then=Value(True)),
default=Value(False),
output_field=models.BooleanField(),
),
)
)
The previous is_vegan
and is_vegetarian
properties were deleted. The one downside is that these “fields” are now available only when the Pizza
instance is retrieved via queryset. The “fields” are not available at instance level.
The unit tests for the properties were retained and these the fields computed by the manager above.
Taking (VG) and (V) ratings to the Customer entity
This commit adds Customer is_vegan
and is_vegetarian
properties. The code added onto the Customer
class:
class Customer(models.Model):
...
def pizzas_ordered(self):
return Pizza.objects.filter(
id__in=self.order_set.values_list("pizzas__id", flat=True)
).distinct()
@property
def is_vegan(self):
return all([pizza.is_vegan for pizza in self.pizzas_ordered()])
@property
def is_vegetarian(self):
return all([pizza.is_vegetarian for pizza in self.pizzas_ordered()])
This unit test (from same commit) tests Customer.pizzas_ordered
function:
def test_pizzas_ordered(self):
load_data()
rossa = Pizza.objects.get(name="rossa")
margherita = Pizza.objects.get(name="margherita")
capricciosa = Pizza.objects.get(name="capricciosa")
diablo = Pizza.objects.get(name="diablo")
campagnola = Pizza.objects.get(name="campagnola")
(customer_vegan, customer_veggy, customer_meat_eater) = Customer.objects.all()
with self.assertNumQueries(3):
self.assertCountEqual(customer_vegan.pizzas_ordered(), [rossa, campagnola])
self.assertCountEqual(
customer_veggy.pizzas_ordered(), [rossa, margherita, campagnola]
)
self.assertCountEqual(
customer_meat_eater.pizzas_ordered(),
[rossa, margherita, diablo, capricciosa],
)
Notice how fetching all pizzas results in 1 query per customer.
The next unit test (from same commit) tests Customer.is_vegan
property:
def test_is_vegan(self):
load_data()
(customer_vegan, customer_veggy, customer_meat_eater) = Customer.objects.all()
with self.assertNumQueries(3):
self.assertTrue(customer_vegan.is_vegan)
self.assertFalse(customer_veggy.is_vegan)
self.assertFalse(customer_meat_eater.is_vegan)
Again, fetching whether a customer is (VG), (V) or neither still results in 1 query per customer. (After the customer has been already fetched.) This is only possible using the PizzaManager
manager level functions implemented previously.
The way test data is created in this demo repo project will be discussed in a future post.
Credits
For test data, I used some pizzas from the menu of a pizza place in Malta. The owners are Sicilian. Highly recommend that place!
Comments !