The N+1 problem is a persistent performance challenge plaguing web developers using Object-Relational Mapping (ORM) in frameworks like Django. In this blog post, we're tackling this performance issue that is still a significant concern for web developers using popular web frameworks.
We will also introduce Django Rest Framework (DRF), one of the leading libraries in the Django ecosystem used to build robust APIs, and how you might encounter the N+1 issue when using it in your Django project. Finally, this article will show how we solved this performance issue at Vinta with Django Virtual Models, an open-source tool we developed.
The Persistent Adversary of Database Performance
If you are a web developer using ORMs to access the database with your web framework, you might encounter performance challenges. One such issue is the N+1 issue, which occurs when a query retrieves one set of information but fetches additional related data for each record. This leads to an unnecessary increase in questions and affects performance.
Now that the Olympic season has ended, let’s imagine our application contains information about athletes and related data, such as the games they participate in and the medals they have won.
How ORMs Address the N+1 Problem
For most of the popular ORMs, such as Active Record from Rails, Eloquent from Laravel, and Django ORM from Django, there are a few options to tackle this issue:
- In Active Record, developers can use
.includes()
to eager load the associated data; - In Eloquent, developers can use
::with()
, which functions similarly to.includes()
in Active Record. It preloads the related data to avoid the N+1 problem.
For the Django ORM, there are two standard methods to mitigate this issue:
select_related()
- A JOIN is created on the related models, resulting in a single query. It is most commonly used for OneToOne and ForeignKey relationships;prefetch_related()
- Two queries are made: one for the primary object and another for its related objects. This method is best suited for prefetching ManyToMany relationships and reversed ForeignKey relationships. The joining is then done in Python, resulting in an optimized query.
Exploring the Issue with a Django and DRF
In many projects, defining and structuring database models is straightforward in Django, as illustrated by our example:
Django REST Framework provides a powerful mechanism to expose this data through Serializers APIs. Here’s how you can translate these models into DRF serializers:
Finally, our DRF views to present the data would look like this:
This allows us to present the data in JSON format like this:
With just a few lines of code, we've created a basic API that adheres to the Don't Repeat Yourself (DRY) principle using Django and DRF. This principle is a tenet of the Django ecosystem. It means avoiding code duplication to ensure maintainability.
DRY is possible thanks to the reuse of serializers to structure and expose data through the views. However, there are a few underlying costs to this approach.
Unveiling the Hidden Costs: Change Amplification and Other Challenges
When we reuse serializers across different views while adhering to the DRY principle, we must be cautious about the efficiency of the queries they execute. While serializers know which data to fetch, they don’t inherently optimize the data retrieval process.
The example above leads to an N+1 problem when listing Athlete
instances and retrieving their related Competitions
and Medals
.
To mitigate this, we could add prefetch_related
to the querysets of AthleteListView
and AthleteRetrieveView
:
However, this solution introduces a new challenge: it breaks the DRY principle and leads to the Change Amplification issue. Every time someone changes one of the serializers, they must change all views that use it. As a result, developers would need to update multiple parts of the codebase to implement any N+1 optimizations, increasing the risk of errors and adding maintenance overhead.
It would be great if we could use serializers in views, and Django would automatically know which prefetches to perform without the need to declare and maintain them explicitly, avoiding the need to update querysets in multiple places throughout the project.
Fortunately, we have the Django Virtual Models library, which can help us with this.
Introducing Django Virtual Models: A "Menu" of Optimizations
Django Virtual Models, an open-source library created by Vinta, significantly enhances the performance and maintainability of Django and Django Rest Framework projects through an advanced prefetching layer.
But how does it achieve this? Let’s explore this further by converting our models into a “Virtual Model” class.
We're building a "menu" of optimization options with Django Virtual Models. In this scenario, we explicitly declare possible prefetched related models for Athlete
, ensuring efficient data retrieval.
Additionally, Django Virtual Models allow us to refine these optimizations further by filtering prefetches or even adding annotations, all in a declarative and flexible manner.
Another significant advantage is the effortless integration of these Virtual Models with our DRF serializers and views.
This way, the library can automatically do the right prefetches and annotations for you, resulting in performance gain and maintainability.
But what about SerializerMethodField
, a powerful Django Rest Framework feature that allows us to customize our serializers? Can Django Virtual Models handle the prefetches inside it?
Let’s take a look at an example:
In this example, even with the Virtual Model class used in the serializer, calling athlete.medals.all()
still performs an additional query to fetch all the medals just to check if the athlete has won any. This extra query can degrade performance, especially when dealing with large datasets.
To avoid this, we can leverage another feature from the package, type hints, through the hints module. We can use type hints to ensure the necessary data is prefetched, avoiding the extra query.
Let’s improve our example:
By using the Annotated type hint along with hints.Virtual("medals")
, we ensure that the medals are prefetched when the athlete data is fetched, making the has_won_any_medal
method more efficient.
Conclusion
Following the Zen of Python principle that "Explicit is better than implicit," Django Virtual Models give developers powerful tools to optimize APIs. By allowing explicit control over prefetching and annotations, particularly in complex serializers, you can improve performance while keeping your codebase clear and maintainable.
Now that you’ve learned Django Virtual Models, how about looking at our Github and telling us your thoughts? Check it out!