Django: deleting a large queryset may lead to high memory usage

Given those two classes:

from django.db import models

class Bar(models.Model):
    name = models.CharField(max_length=100)

class Foo(models.Model):
    name = models.CharField(max_length=100)
    bar = models.ForeignKey(Bar, on_delete=models.SET_NULL)

When doing Bar.objects.filter().delete() I would have thought that the SQL sent to the database would be DELETE FROM myapp_bar. But in reality the SQL generated is:

SELECT "myapp_bar"."id",
       "myapp_bar"."name"
  FROM "myapp_bar"

UPDATE "myapp_foo"
   SET "bar_id" = NULL
 WHERE "myapp_foo"."bar_id" IN (60194, 60195, 60196, 60197, ...)

DELETE
  FROM "myapp_bar"
 WHERE "myapp_bar"."id" IN (60194, 60195, 60196, 60197, ...)

The issue

The first statement is telling us that Django is loading all objects matching the queryset in memory. If you are deleting a lot of objects, the memory usage will increase drastically and might even lead to an OOM issue (Out Of Memory). I have been bitten by this issue recently while trying to delete tens of millions of objects.

Why

So why are we loading everything in memory? The official documentation states that: "on_delete doesn’t create an SQL constraint in the database. Support for database-level cascade options may be implemented later."

Indeed, there is a ticket opened for that since 11 years. So until this is corrected, be careful about the size of queryset you want to delete.

When should I worry ?

This issue does not apply every time we are calling .delete()on a queryset. It happens only when those conditions are met:

  • another model is referencing the one where we want to delete objects using ForeignKey or OneToOneField
  • the attribute on_delete should be set on the foreign key
  • the number of objects we want to delete is big