Celery: an overview of the architecture and how it works

Asynchronous task queues are tools to allow pieces of a software program to run in a separate machine/process. It is often used in web architectures as way to delegate long lasting tasks while quickly answering requests. The delegated task can trigger an action such as sending an email to the user or simply update data internally in the system when it finishes executing.

In this article you will understand how a task queue works and what are the components of its architecture. We will focus on Celery, the most popular tool for the job in Python projects. Most other tools employ the same architecture, so the principles also apply.

The basics

In broad terms, the reason why we use async tasks queues is because we want to answer quickly to our users. The simplest use case for it is to delegate long lasting CPU jobs. But the most popular reason people use async tasks is probably to execute external API calls. Whenever you depend on external services, you no longer have control over how long things will take to be ready. It might also be the case that they will never be ready, since the system might be down or broken. Another good use for async tasks is to prepare and cache result values. You can also use them to spread bulk database insertions over time. This can help you avoid DDoS’ing your own database. Cron jobs are yet another good example of things you can do with them.
There are many tools available to manage async tasks in Python. RQ seems to be getting some attention lately, but Celery is the all time champion so far.


The issue of running async tasks can be easily mapped to the classic Producer/Consumer problem. Producers place jobs in a queue. Consumers then check the head of the queue for awaiting jobs, pick the first one and execute.

In the context of async tasks queues, 'producers' are commonly the 'web nodes' or whatever system that is placing jobs, the queue is referred to as 'broker', and consumers as 'workers'. Since workers can also place new tasks in the queue, they can also behave as 'producers'. Now that we have an overview, let's dig a little deeper.

The Broker

The concept of a broker is very simple: a queue. But what are the available ways to implement a queue in a computer system? One of the simplest would be to use a text file. Text files can hold a sequence of job descriptions to be executed, therefore, we do could use them as the broker of our system. The problem with text files is that they are not made to handle real application problems such as network and concurrent access. Because of that we need something more robust. SQL databases, on the other hand, are capable of running in a network and dealing with concurrent access. The problem with them is that they are too slow. NoSQL databases, by contrast, are quite fast, but many times they lack reliability. So, when building queues, we should use fast, reliable, concurrency enabled tools such as RabbitMQ, Redis and SQS.

Celery has full support for RabbitMQ and Redis. Although SQS and Zookeeper are also available, they are offered with limited capabilities. (see here more)

Web and Worker nodes

Web and worker nodes are plain servers. The only difference is that Web nodes receive requests from the internet and place jobs to be processed asynchronously and Workers are the machines that pick these jobs, execute and provide a response. Despite this logic separation, you will commonly find the code for both in the same repository. This is a good thing because both applications can benefit from sharing things like models and services. This approach also prevents inconsistencies.

Executing tasks

Here is a very simple example of a Celery task and the code to execute it:

# This goes in the `worker node`

from celery import Celery

app = Celery(...)

def add(a, b):
    return a + b
# This goes in the `web node`

from tasks import add

r = add.delay(4, 5).get()
print(r)  # 9

The first example shows the code that should run asynchronously. This goes in a worker node. The second example is the code that places a job in the queue to be ran. This normally goes in a web node. In this example, the web node places an add job and waits until the result is available. When the response is ready the result is printed.

Here is a Django example:

# This goes in the `worker node`

def update_attendees(event, n):
    event.attendees_number = n
# This goes in the `web node`

event = Event.objects.get(name='DjangoCon')
update_attendees.delay(event, 9001)

In this case the web node is placing a task to update the number of attendees of an event. Notice we are passing the event object to the task. Don't do this.

Don't pass complex objects as task parameters

Objects get serialized and stored in the broker. They are then deserialized before being passed to the task. Passing complex objects such as a Model instance as the parameter comes with a few problems. First of all, in old versions of Celery, Pickle was used as the default serialization method. As you might know, Pickle has security vulnerabilities. By allowing complex objects, you are increasing the chances of getting exposed. The latest version of Celery addresses this by using JSON as the default serialization method. More importantly: the database object you passed might change in between the time you place the task and the time it gets executed. In that case you will be working with an outdated version of it.

What you want to do is to pass the ID of the object and fetch a fresh copy from the database at the beginning of the task:

def update_attendees(event_id, n):
    e = Event.objects.get(id=id_event)
    e.attendees_number = n
e = Event.objects.get(name='DjangoCon')
update_attendees.delay(, 9001)

The Results Backend

In the previous examples we've called delay and get together as if they were a single action. In reality they are two separate things. delay places the task in the queue and returns a promise that can be used to monitor the status and get the result when it's ready. Calling get in that promise will block the execution until the result is available. The add task has to store the result somewhere so it can then be accessed by the process that triggered it. This means that we missed a piece of the architecture. Besides the web, broker and worker components, there's also a results backend.

The results backend will be used to store the task results. In practice you can use the same instance you are using for the broker to also store results. There are other technologies besides the supported broker options that can be used as results backend in Celery, but there are some differences depending on what you use. In Postgres, for example, the get method will do polling to check if the result is ready. For other tools, such as Redis, this is done via pub/sub protocol.

Alright, this should cover basics of the components in the Celery architecture. Hope this helped you better understanding how it works and that you’re now feeling more confident about it.

If you liked this post and want to learn more about Celery perhaps you should take a look at my other blog post: Celery in the wild: tips and tricks to run async tasks in the real world.

Check out more from our blog:

Filipe Ximenes

Founder & Chief Technology Officer