Caching in Django with the Prefetch and Annotate Pattern

3 minute read

At some point you will add a property to a model that triggers an n+1 query. This is when an attribute of an associated table is requested. For each time this happens n additional queries are added to the request. What we need to do is request all of the joined tables as a single request and then reference the fetched data with the joined tables needed for the model’s property. Fortunately, Django makes caching extremely simple and also provides a built in test matcher that makes detecting n+1 queries a snap.

Our story begins with the addition of two innocent properties to a blog post model. For our blog posts we would like to include a count of reviews along with the average review score on the post_list page for each post.

@property
def average_rating(self):
    return self.review_set.aggregate(
        average_rating=Avg('rating')
    )['average_rating']
    
@property
def number_of_reviews(self):
    return self.review_set.count()

Functionally everything checks out and both properties can easily be referenced in the post index template. However, there is a severe performance hit that happens when we calculate averages or counts on a related table like review_set. We can write a test that asserts that post_list only results in one hit to the database.

class PostViewTests(TestCase):
    def setUp(self):
        self.user = factories.UserFactory()
        self.client = Client()
        self.posts = factories.PostFactory.create_batch(5, author=self.user)

    def test_fetches_blog_articles_in_single_query(self):
        with self.assertNumQueries(1):
            response = self.client.get(reverse('post_list'))
        self.assertEqual(response.status_code, 200)

This test fails with 11 fetched queries! This makes sense considering that we are constructing a listing of 5 posts. Both fields add 5 additional queries in addition to the original query because they reference data on the Review model. This will eventually lead to some unacceptable performance bottlenecks as the list of posts grows and additional properties are added to Post.

Django provides two tools that can help us solve this problem - QuerySet.annotate and QuerySet.prefetch. We will start by creating a PostQuerySet class and assign that as a manager to our Post model’s objects property.

class PostQuerySet(models.QuerySet):
    def prefetch_detail(self):
        return self.annotate_with_average_rating(
            ).annotate_with_num_reviews()

    def annotate_with_average_rating(self):
        return self.annotate(
            _average_rating_cache=Avg('review__rating')
        )

    def annotate_with_num_reviews(self):
        return self.annotate(
            _num_reviews=Count('review')
        )

class Post(models.Model):
    objects = PostQuerySet.as_manager()

The QuerySet.annotate method is used to add a private property to each post with the calculated average review_rating and review count. Posts can be fetched with the detail records in the prefetch_detail method. This method can be called when references to a Post’s average_rating and number_of_reviews are needed. The initial query might run slightly longer than a simple selection on posts. However, this will pay off when additional hits to the database no longer occur with the addition of the annotated calculations.

@property
def average_rating(self):
    if hasattr(self, '_average_rating_cache'):
        return self._average_rating_cache
    return self.review_set.aggregate(
        average_rating=Avg('rating')
    )['average_rating']

@property
def number_of_reviews(self):
    if hasattr(self, '_num_reviews'):
        return self._num_reviews
    return self.review_set.count()

A simple guard condition can now be added to each property that checks for the presence of the annotated cached value before calling the calculation on Post.review_set. With this update and the use of prefetch_detail in the post_list view method, the query count for the test returns back to a single query count and passes!

    # prefetch the blog articles with the annotations
    # mysite.blog.views
    def post_list(request):
        posts = Post.objects.prefetch_detail()
        return render(request, 'blog/post_list.html', {'posts': posts})

Conclusion

Testing view methods by using TestCase.assertNumQueries is an easy way to detect n+1 performance issues in Django applications. Fixing n+1 queries can usually be solved with the addition of an annotation for the calculated or related data needed by the view requesting the data. Generally, a single reusable prefetch_detail QuerySet method can be used in cases where a template or serializer needs data that should be calculated or included in advance. It is also up to the model to handle these annotations and use them when they exist.

Making these simple changes to an application can really help scale your Django application and possibly save money on the computation power needed for your database server.

Updated: