This content originally appeared on DEV Community and was authored by Eduardo Zepeda
Django’s ORM is quite useful and versatile, it can perform most of the common SQL operations, such as filtering, partitioning, joins or sorting information, creating aliases, but it also has its limitations, especially when combined with subqueries, today I’ll tell you about one of its limitations and how to solve it.
Despite its few weaknesses, its ORM is one of the reasons why you should use Django.
Django annotate and subqueries, a performance problem
The Django annotate function, which I already told you about in a post where I explain the differences between annotate and aggregate in Django, is used to add information to a SQL query, this information can be an average, a sum or anything else you want, the problem occurs when that information comes from a subquery.
Let me give you an example:
from django.db.models import F
from django.db.models.expressions import Subquery
first_subquery = Subquery(...)
second_subquery = Subquery(...)
queryset = YourModel.objects.annotate(first_annotation=first_subquery)
.annotate(second_annotation=second_subquery)
.annotate(
third_annotation=F("first_subquery") - F("second_subquery"))
.annotate(
fourth_annotation=((F("first_subquery") - F("second_subquery")) / F("second_subquery"))
)
The problem here arises when we mix subqueries with annotate, and then proceed to use those annotations in other annotations.
Django does not have the ability to recognize that it is already repeating the subqueries over and over again, so the SQL it generates repeats the same subqueries over and over again, resulting in a poorly performing query; we fall into the famous n+1 queries problem.
SQL generated by Django using annotate and subqueries is inefficient.
Worse, where exactly is the problem? The django ORM translates the above queryset into the following SQL query:
SELECT columns
(SELECT ...first_subquery - SELECT ...second_subquery) AS "third_annotation",
(SELECT ...first_subquery - SELECT ...second_subquery)/(SELECT ...first_subquery) as "fourth_annotation",
(SELECT ...first_subquery) as "first_annotation",
(SELECT ...second_subquery) as "second_annotation"
FROM table_a LEFT OUTER JOIN table_b
ON table_a.id = table_b.id
GROUP BY table_a.id ...
Notice how Django is reusing the SQL from each subquery multiple times during the query, instead of performing the query once and then reusing that value.
If you don’t know how to get the SQL query that Django’s ORM generates, I remind you, qs represents your queryset:
print(qs.query)
How to fix this? Well, one of the ways to fix this SQL query is to use Common Table Expressions (CTEs), however, as of this writing, Django does not support Common Table Expressions (CTEs), so we will have to use a raw query instead of the methods already provided by the Django ORM.
Use Common Table Expressions (CTEs) to improve annotate and subqueries performance.
The solution is to create a raw query, remember that modern versions of django you can use the raw method of your model manager so that Django automatically assigns it to a queryset object of your respective model.
qs = YourModel.objects.raw("YOUR_SQL_RAW_QUERY_GOES_HERE")
The SQL query with the Common Table Expressions (CTEs) that we will use would look like this:
WITH my_cte AS (
SELECT
a.column
(SELECT ...subquery_one) AS first_annotation,
(SELECT ...subquery_two) AS second_annotation
FROM table_a
LEFT OUTER JOIN table_b
ON table_a.id = table_b.id
GROUP BY table_a.id ...
)
SELECT
columns,
first_annotation,
second_annotation,
first_annotation - second_annotation AS third_annotation,
(first_annotation - second_annotation)/first_annotation AS fourth_annotation
FROM my_cte;
As you can see the subqueries are in parentheses and each of them appears only once.
Using Common Table Expressions (CTEs) will allow us an efficient query, avoiding multiple repetitive queries to the database and will give us a performance that outperforms the Django ORM’s query by several orders of magnitude. Perhaps implementing CTEs is one of the actions that can be taken to improve the Django framework.
This content originally appeared on DEV Community and was authored by Eduardo Zepeda
Eduardo Zepeda | Sciencx (2024-07-15T21:51:01+00:00) Fix slow queries in Django when using annotate and subqueries. Retrieved from https://www.scien.cx/2024/07/15/fix-slow-queries-in-django-when-using-annotate-and-subqueries/
Please log in to upload a file.
There are no updates yet.
Click the Upload button above to add an update.