Database concurrency in Django the right way

Guilherme Caminha
September 29, 2016
<p>When developing applications that have specific needs for running asynchronous tasks outside the web application, it is common to adopt a task queue such as <a href="http://www.celeryproject.org/">Celery</a>. This allows, for example, for the server to handle a request, start an asynchronous task responsible of doing some heavyweight processing, and return an answer while the task is still running.</p><p>Building upon this example, the ideal is to separate the high time-demanding parts from the view processing flow, so we run those parts in a separate task. Now, let's suppose we have to do some database operations both in the view and in the separate task. If not done carefully, those operations can be a source for issues that can be hard to track.</p><h3 id="atomic_requests-database-config-parameter">ATOMIC_REQUESTS database config parameter</h3><p>It's a common practice to set the <code>ATOMIC_REQUESTS</code> parameter to <code>True</code> in the database configuration. This configuration enables for the Django views to run inside of a transaction, so that if an exception is produced during the request handling, Django can simply roll back the transaction. This also ensures the database will never be left in an inconsistent state, and since a transaction is an atomic operation in the database, no other application trying to access the database while a transaction is running will be able to see inconsistent data coming from an incomplete transaction.</p><h3 id="database-race-condition-with-tasks-and-django-request-handlers">Database race condition with tasks and Django request handlers</h3><p>Data races happen when two or more concurrent threads try to access the same memory address (or in this case, some specific data in a database) at the same time. This can lead to non-deterministic results if, say, one thread is trying to read data while the other is writing it, or if both threads are writing at the same time. Of course, if both threads are just reading data, no problem will occur.</p><p>Now, regarding our problem, let's start by writing a simple view, which will write some data to our database:</p><pre><code class="language-python">from django.views.generic import View from django.http import HttpResponse from .models import Data class SimpleHandler(View): def get(self, request, *args, **kwargs): my_data = Data.objects.create(name='Dummy') return HttpResponse('pk: {pk}'.format(pk=my_data.pk)) </code></pre><p>And here's our model:</p><pre><code class="language-python">from django.db import models class Data(models.Model): name = models.CharField(max_length=50) </code></pre><p>This models a very simple request handling. If we make a request from our browser, we will get as an answer the primary key from the inserted data, such as <code>pk: 41</code>. Now, we modify our get method to launch a Celery task that will fetch data from our database:</p><pre><code class="language-python"> def get(self, request, *args, **kwargs): my_data = Data.objects.create(name='Dummy') do_stuff.delay(my_data.pk) return HttpResponse('pk: {pk}'.format(pk=my_data.pk)) </code></pre><p>And in our tasks file:</p><pre><code class="language-python">from celery_test.celery import app from .models import Data @app.task def do_stuff(data_pk): my_data = Data.objects.get(pk=data_pk) my_data.name = 'new name' my_data.save() </code></pre><p>(Don't forget to import <code>do_stuff</code> in your views file!)</p><p>Now we have a subtle race condition: it is likely that Celery will eventually raise an exception when fetching the data, such as <code>Task request_handler.tasks.do_stuff[2a3aecd0-0720-4360-83b5-3558ae1472f2] raised unexpected: DoesNotExist('Data matching query does not exist.',)</code>. It might seem that this should not happen, since we are inserting a row, getting its primary key, and passing it for the task. Thus, the data matching the query should exist. But, as said earlier, if <code>ATOMIC_REQUESTS</code> is set to <code>True</code>, the view will run in a transaction. The data will only be externally accessible when the view finishes its execution, and the transaction is committed. This usually will happen <strong>after</strong> Celery executes the task.</p><h3 id="solution-approaches">Solution approaches</h3><p>There are many solutions for this problem. The first and more obvious one is to set <code>ATOMIC_REQUESTS</code> to <code>False</code>, but we want to avoid this since this will affect every other view in our project, and using transactions in requests have many advantages as stated earlier. Another solution is to use the <a href="https://docs.djangoproject.com/en/1.8/topics/db/transactions/#django.db.transaction.non_atomic_requests">non_atomic_requests</a> decorator, as this would only affect one view. Still, this can be unwanted, since we can be compromising this one view's functionality. There are also libraries that were used to run code when the current transaction is committed, such as <a href="https://github.com/adamchainz/django_atomic_celery">django_atomic_celery</a> and <a href="https://github.com/aaugustin/django-transaction-signals">django-transaction-signals</a>, but those are now legacy and should not be used. An explanation can be read on the django-transaction-signals project.</p><p>The current and most used alternative is to use hooks. For Django &gt;= 1.9, you can use the <a href="https://docs.djangoproject.com/en/1.10/topics/db/transactions/#django.db.transaction.on_commit">on_commit</a> hook, and for Django &lt; 1.9, use <a href="https://django-transaction-hooks.readthedocs.io/en/latest/">django-transaction-hooks</a>. We'll use the <code>on_commit</code> approach here, but if you have to use django-transaction-hooks, the solution is very similar.</p><p>All you have to do is import <code>transaction</code> as in <code>from django.db import transaction</code> on your views file, and pass any function you want to execute after the commit to <code>transaction.on_commit</code>. This function should not have any arguments, but we can wrap our task call in a lambda, so our final view looks like this:</p><pre><code class="language-python">class SimpleHandler(View): def get(self, request, *args, **kwargs): my_data = Data.objects.create(name='Dummy') transaction.on_commit(lambda: do_stuff.delay(my_data.pk)) return HttpResponse('pk: {pk}'.format(pk=my_data.pk)) </code></pre><p>The only caveat of the above solution is that you'll have to use <a href="https://docs.djangoproject.com/en/1.11/topics/db/transactions/#use-in-tests"><code>TransactionTestCase</code> to properly test views that use <code>transaction.on_commit</code></a>. <code>on_commit</code> is good enough for most applications. If you need more control, you can use the <code>non_atomic_requests</code> decorator. But remember that you will have to deal with rollbacks manually.</p><p>Want read more about <em>Django</em>? Check other posts from our blog:</p><p><a href="http://www.vinta.com.br/blog/2016/controlling-access-a-django-permission-apps-comparison/">Controlling access: a Django permission apps comparison</a></p><p><a href="http://www.vinta.com.br/blog/2015/10-django-apps-youre-not-using-but-should-be/">10 Django apps you're not using but should be</a></p>