Caching in Django with the Prefetch and Annotate Pattern
At some point you will add a property to a model that triggers an n+1 query. This is when an attribute of an associated table is requested. For each time this happens n additional queries are added to the request. What we need to do is request all of the joined tables as a single request and then reference the fetched data with the joined tables needed for the model’s property. Fortunately, Django makes caching extremely simple and also provides a built in test matcher that makes detecting n+1 queries a snap.
Our story begins with the addition of two innocent properties to a blog post model. For our blog posts we would like to include a count of reviews along with the average review score on the
post_list page for each post.
Functionally everything checks out and both properties can easily be referenced in the post index template. However, there is a severe performance hit that happens when we calculate averages or counts on a related table like
review_set. We can write a test that asserts that
post_list only results in one hit to the database.
This test fails with 11 fetched queries! This makes sense considering that we are constructing a listing of 5 posts. Both fields add 5 additional queries in addition to the original query because they reference data on the
Review model. This will eventually lead to some unacceptable performance bottlenecks as the list of posts grows and additional properties are added to Post.
Django provides two tools that can help us solve this problem -
QuerySet.prefetch. We will start by creating a
PostQuerySet class and assign that as a manager to our Post model’s objects property.
QuerySet.annotate method is used to add a private property to each post with the calculated average review_rating and review count. Posts can be fetched with the detail records in the
prefetch_detail method. This method can be called when references to a Post’s
number_of_reviews are needed. The initial query might run slightly longer than a simple selection on posts. However, this will pay off when additional hits to the database no longer occur with the addition of the annotated calculations.
A simple guard condition can now be added to each property that checks for the presence of the annotated cached value before calling the calculation on
Post.review_set. With this update and the use of
prefetch_detail in the
post_list view method, the query count for the test returns back to a single query count and passes!
Testing view methods by using
TestCase.assertNumQueries is an easy way to detect n+1 performance issues in Django applications. Fixing n+1 queries can usually be solved with the addition of an annotation for the calculated or related data needed by the view requesting the data. Generally, a single reusable
prefetch_detail QuerySet method can be used in cases where a template or serializer needs data that should be calculated or included in advance. It is also up to the model to handle these annotations and use them when they exist.
Making these simple changes to an application can really help scale your Django application and possibly save money on the computation power needed for your database server.