Django: deleting a large queryset may lead to high memory usage
Given those two classes:
from django.db import models
class Bar(models.Model):
name = models.CharField(max_length=100)
class Foo(models.Model):
name = models.CharField(max_length=100)
bar = models.ForeignKey(Bar, on_delete=models.SET_NULL)
When doing Bar.objects.filter().delete() I would have thought that the SQL sent to the database would be DELETE FROM myapp_bar. But in reality the SQL generated is:
SELECT "myapp_bar"."id",
"myapp_bar"."name"
FROM "myapp_bar"
UPDATE "myapp_foo"
SET "bar_id" = NULL
WHERE "myapp_foo"."bar_id" IN (60194, 60195, 60196, 60197, ...)
DELETE
FROM "myapp_bar"
WHERE "myapp_bar"."id" IN (60194, 60195, 60196, 60197, ...)
The issue
The first statement is telling us that Django is loading all objects matching the queryset in memory. If you are deleting a lot of objects, the memory usage will increase drastically and might even lead to an OOM issue (Out Of Memory). I have been bitten by this issue recently while trying to delete tens of millions of objects.
Why
So why are we loading everything in memory? The official documentation states that: "on_delete doesn’t create an SQL constraint in the database. Support for database-level cascade options may be implemented later."
Indeed, there is a ticket opened for that since 11 years. So until this is corrected, be careful about the size of queryset you want to delete.
When should I worry ?
This issue does not apply every time we are calling .delete()on a queryset. It happens only when those conditions are met:
- another model is referencing the one where we want to delete objects using
ForeignKeyorOneToOneField - the attribute
on_deleteshould be set on the foreign key - the number of objects we want to delete is big