Nov. 25, 2024

GenericForeignKey Deep Filtering

GenericForeignKey Deep Filtering

One of the many "batteries" Django comes with is GenericForeignKey (often shortened to GFK). I'm not necessarily the biggest fan of that particular battery (that might be a topic for another post?), but it's hard to deny that GFKs can enable some pretty nifty use-cases. Recently at work I was tasked with implementing a kind of deep filtering of a model that used a GFK, and came up with a technique that seems generic (hehe) enough to be worth sharing.

Quick refresher: regular foreign keys

In order to show the limitations of GFKs that led me to create my "deep filtering" technique, let's first start with a quick example involving a regular ForeignKey. I'll go for the classic Book model, this time with a related Review model that will come in handy later.

from django.db import models

class Book(models.Model):
    author = models.ForeignKey("auth.User", on_delete=models.CASCADE)
    title = models.CharField(max_length=200)


class Review(models.Model):
    book = models.ForeignKey(Book, on_delete=models.CASCADE)
    reviewer = models.ForeignKey("auth.User", on_delete=models.CASCADE)
    score = models.PositiveIntegerField()

Now if you want to list all reviews attached to a book whose title contains the word "Django", you can do Review.objects.filter(book__title__icontains="django"). The nifty __ double-underscore syntax of Django's ORM enables "jumping" over any foreign key. You can even do it multiple times. Review.objects.filter(book__author__username="baptiste") will list all reviews attached to a book authored by the user bmispelon. Neat!

Generic Foreign Keys

Whereas a regular foreign key points to a single model class (boring!), a generic foreign key can point to any model you wish (exciting!). Let's try an example, inspired by the real-life LogEntry model from Django's admin:

from django.contrib.contenttypes.fields import GenericForeignKey
from django.contrib.contenttypes.models import ContentType
from django.db import models


class LogEntry(models.Model):
    timestamp = models.DateTimeField(auto_now_add=True)
    user = models.ForeignKey("auth.User", on_delete=models.CASCADE)
    content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
    object_id = models.PositiveIntegerField()
    affected = GenericForeignKey("content_type", "object_id")
    message = models.TextField()

The LogEntry model is meant to track events on different objects within our codebase. It has a timestamp to store when the even happened, a user to know who triggered the event, a message where we can store a description of the event, and finally an affected generic foreign key that lets us attach the log entry to any model.

"Deep filtering"

The problem I was trying to solve was that I wanted to get a list of all log entries that "affected" a given user. This could be because the entry was attached directly to the user instance, but it could also be because it was attached to a book whose author was the user, or a review from the given user, ...

With a regular foreign key, we could have used __ filtering like we showed in the previous section, but that's not possible anymore with a generic foreign key.

If we restrict the problem to a single model, it becomes easier to solve. Say for example that we want to get all log entries that are attached directly to a given user USER (an instance of the django.contrib.auth.models.User model), we can do:

LogEntry.objects.filter(affected=USER)

Though it's a bit more complicated, it's also possible to get all entries that are attached to a book where USER is the author:

LogEntry.objects.filter(
    content_type=ContentType.objects.get_for_model(Book),
    object_id__in=Book.objects.filter(author=USER)
)

This approach works also for reviews by USER:

LogEntry.objects.filter(
    content_type=ContentType.objects.get_for_model(Review),
    object_id__in=Review.objects.filter(reviewer=USER)
)

Or even getting entries attached to a review for one of USER's books:

LogEntry.objects.filter(
    content_type=ContentType.objects.get_for_model(Review),
    object_id__in=Review.objects.filter(book__author=USER)
)

CASE WHEN to the rescue

The idea is to generalize the approach of the last three examples by creating a mapping of model -> Q object, where the Q object is used to filter down the model queryset:

from django.contrib.auth.models import User
from django.contrib.contenttypes.models import ContentType
from django.db.models import BooleanField, Q, Value
from django.db.models.expressions import Case, When


def is_affected(user):
    Q_OBJS = {
        Book: Q(author=user),
        Review: Q(book__author=user) | Q(reviewer=user),
    }

    whens = [
        # The entry is directly attached to the user
        When(content_type=ContentType.objects.get_for_model(User), then=Q(object_id=user.pk))
    ]

    for model_class, qobj in Q_OBJS.items():
        content_type = ContentType.objects.get_for_model(model_class)
        object_ids = model_class.objects.filter(qobj)
        whens.append(When(content_type=content_type, then=object_id__in=object_ids))

    return Case(*whens, default=Value(False), output_field=BooleanField())

Now that we have this function, getting a list of log entries that affect USER becomes as simple as:

LogEntry.objects.filter(is_affected(USER))

VoilĂ !