GenericForeignKey Deep Filtering
One of the many "batteries" Django comes with is
GenericForeignKey
(often shortened to GFK). I'm not necessarily the biggest fan of that
particular battery (that might be a topic for another post?), but it's hard to deny that GFKs
can enable some pretty nifty use-cases. Recently at work I was tasked with implementing a kind of deep
filtering of a model that used a GFK, and came up with a technique that seems generic (hehe)
enough to be worth sharing.
Quick refresher: regular foreign keys
In order to show the limitations of GFKs that led me to create my "deep filtering" technique, let's first
start with a quick example involving a regular ForeignKey. I'll go for the classic
Book model, this time with a related Review model that will come in handy later.
from django.db import models
class Book(models.Model):
author = models.ForeignKey("auth.User", on_delete=models.CASCADE)
title = models.CharField(max_length=200)
class Review(models.Model):
book = models.ForeignKey(Book, on_delete=models.CASCADE)
reviewer = models.ForeignKey("auth.User", on_delete=models.CASCADE)
score = models.PositiveIntegerField()
Now if you want to list all reviews attached to a book whose title contains the word "Django", you can do
Review.objects.filter(book__title__icontains="django"). The nifty
__ double-underscore syntax of Django's ORM enables "jumping" over any foreign key. You can even
do it multiple times. Review.objects.filter(book__author__username="baptiste") will list all
reviews attached to a book authored by the user bmispelon. Neat!
Generic Foreign Keys
Whereas a regular foreign key points to a single model class (boring!), a
generic foreign key can point to any model you wish
(exciting!). Let's try an example, inspired by the real-life
LogEntry
model from Django's admin:
from django.contrib.contenttypes.fields import GenericForeignKey
from django.contrib.contenttypes.models import ContentType
from django.db import models
class LogEntry(models.Model):
timestamp = models.DateTimeField(auto_now_add=True)
user = models.ForeignKey("auth.User", on_delete=models.CASCADE)
content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
object_id = models.PositiveIntegerField()
affected = GenericForeignKey("content_type", "object_id")
message = models.TextField()
The LogEntry model is meant to track events on different objects within our codebase. It has a
timestamp to store when the even happened, a user to know who triggered the event, a
message where we can store a description of the event, and finally an
affected generic foreign key that lets us attach the log entry to any model.
"Deep filtering"
The problem I was trying to solve was that I wanted to get a list of all log entries that "affected" a given user. This could be because the entry was attached directly to the user instance, but it could also be because it was attached to a book whose author was the user, or a review from the given user, ...
With a regular foreign key, we could have used __ filtering like we showed in the previous
section, but that's not possible anymore with a generic foreign key.
If we restrict the problem to a single model, it becomes easier to solve. Say for example that we want to get
all log entries that are attached directly to a given user USER (an instance of the
django.contrib.auth.models.User model), we can do:
LogEntry.objects.filter(affected=USER)
Though it's a bit more complicated, it's also possible to get all entries that are attached to a book where
USER is the author:
LogEntry.objects.filter(
content_type=ContentType.objects.get_for_model(Book),
object_id__in=Book.objects.filter(author=USER)
)
This approach works also for reviews by USER:
LogEntry.objects.filter(
content_type=ContentType.objects.get_for_model(Review),
object_id__in=Review.objects.filter(reviewer=USER)
)
Or even getting entries attached to a review for one of USER's books:
LogEntry.objects.filter(
content_type=ContentType.objects.get_for_model(Review),
object_id__in=Review.objects.filter(book__author=USER)
)
CASE WHEN to the rescue
The idea is to generalize the approach of the last three examples by creating a mapping of model -> Q object, where the Q object is used to filter down the model queryset:
from django.contrib.auth.models import User
from django.contrib.contenttypes.models import ContentType
from django.db.models import BooleanField, Q, Value
from django.db.models.expressions import Case, When
def is_affected(user):
Q_OBJS = {
Book: Q(author=user),
Review: Q(book__author=user) | Q(reviewer=user),
}
whens = [
# The entry is directly attached to the user
When(content_type=ContentType.objects.get_for_model(User), then=Q(object_id=user.pk))
]
for model_class, qobj in Q_OBJS.items():
content_type = ContentType.objects.get_for_model(model_class)
object_ids = model_class.objects.filter(qobj)
whens.append(When(content_type=content_type, then=object_id__in=object_ids))
return Case(*whens, default=Value(False), output_field=BooleanField())
Now that we have this function, getting a list of log entries that affect USER becomes as simple
as:
LogEntry.objects.filter(is_affected(USER))
VoilĂ !