Wagtail: Annotating a Parent Attribute on a Page QuerySet
I've been working on a Wagtail project recently where I found myself wanting to filter a queryset of pages based on the property of each row's parent. I found a solution using annotations and Django's database functions which I thought was neat.
TLDR;
from django.db.models import OuterRef, Subquery
from django.db.models.functions import Reverse, Substr
from wagtail.models import Page
def parent_path(path):
"""
Return a database expression that evaluates to the parent path of the given path.
"""
# Unfortunately Substr doesn't support negative indices which would be ideal
# here since the parent path is the child's path minus the last N characters (
# where N is the `steplen` which defaults to 4).
# So instead we do a reverse-substr-reverse which achieves the same thing.
return Reverse(
Substr(Reverse(path), 1 + Page.steplen), # Substr uses 1-index
)
def parent_field(field_name, queryset=None):
"""
Return a database expression that evaluates to value of the row's parent's given field.
"""
if queryset is None:
queryset = Page.objects.all()
queryset = queryset.filter(path=parent_path(OuterRef("path")))
return Subquery(queryset.values(field_name))
If that didn't quite make sense, here's some context and some more explanations:
Wagtail's Page tree 🌳
One of Wagtail's key piece of code is the Page class. It's a normal Django model with fields like
"title" or "publication date", but pages can also be nested under one another.
Each page has single parent (except for a special page called a "root page" which has no parent), which creates a tree structure where pages can be nested as deep as you'd like (well, up to a limit but you're unlikely to hit that).
MP trees
Wagtail uses django-treebeard's "Materialized Path Trees" to make querying the page tree as efficient as possible. It's a fairly complex library, but the key idea is
that each Page is given a unique path attribute which is constructed in a clever way
that makes navigating the tree easy.
A path is a string made of of segments of fixed size (4 characters by default). For example if a path was
0001000200030001 then its 4 segments would be 0001, 0002,
0003, and 0001. When creating a new page, its path is built by taking the parent's
path and adding a new segment at the end. So in our example, if we were adding a new page to our
0001000200030001, then its path could look something like 0001000200030001 0007 (I
added a space to make it clearer, in practice the segments are all stuck together).
Knowing your parent 🧓🏻
With this system, it becomes very easy to compute the path of a page's parent: all you have to do is remove the last segment from the page's path. In Python it would look like this:
parent_path = page.path[:-4]
(In practice it's a bit more complicated because the segment length is configurable, but the real code is not that much more complicated than that really.)
Back to the initial problem
If you remember, the issue I was originally trying to solve was to filter a queryset of pages based on the value of the parent's specific field. So I needed a way to somehow join a page to its parent directly in the database.
There's many different ways to achieve this, but one way that works well (especially with the Django ORM) is to use a subquery. The plan of attack then becomes:
- For each page in the queryset, calculate the parent path using the formula from above;
- Using a subquery, fetch the parent of each row using the computed path;
- Once we have the parent, select the field/column we care about;
- Annotate this new value to the queryset.
Computing the parent path using database functions
Django ships with a very useful collection of
database functions. Of
particular interest to us on this problem is Substr (short for substring, I'm sure that was worth
saving those 3 characters). Using our previous formula, it should be possible to compute a page's parent path
using something like Substr("path", -4) but unfortunately that doesn't work for two reasons:
Substrdoes not support negative indices 🥹Substrindices start at 1, not 0 😭
But that's not going to stop me, and with a bit of faffing about I was able to come up with this function:
from django.db.models.functions import Reverse, Substr
from wagtail.models import Page
def parent_path(path):
"""
Return a database expression that evaluates to the parent path of the given path.
"""
# Unfortunately Substr doesn't support negative indices which would be ideal
# here since the parent path is the child's path minus the last N characters (
# where N is the `steplen` which defaults to 4).
# So instead we do a reverse-substr-reverse which achieves the same thing.
return Reverse(
Substr(Reverse(path), 1 + Page.steplen), # Substr uses 1-index
)
Once we have that we're almost there, all that's left is to remember how to use Django's
Subquery which I always have to look up somehow:
# OuterRef essentially means "the field with that name on the outer queryset":
parent_page = Page.objects.filter(path=parent_path(OuterRef("path")))
# Let's use title here, but it could be anything:
parent_title = Subquery(parent_page.values("title"))
# Now we can annotate and filter:
original_queryset.annotate(parent_title=parent_title).filter(parent_title=...)
And voilà, we now have a parent field annotate on our queryset!
Conclusion
The Django ORM is very powerful, and the database functions it ships really lets you build interesting things.
Subquery is also very neat (remember when I used it last year to
build full instances from a subquery, bypassing the 1-column limit?).
But since you've read all the way to the end, I'll share a little secret with you. On that work project I
mentionned where I wrote this code, I ended up scrapping the whole thing after my colleague pointed out during
code review that Wagtail already kind of did what I wanted with
PageQuerySet.child_of(), so the real lesson is know your tools, and know when to get rid of your clever code when standard solutions
already exist. Oh, and that code reviews are awesome!