Taming Irreversibility with Feature Flags (in Python)

Hugo Bessa
May 21, 2018

Feature Flags are a simple technique to make features of your application quickly toggleable. The way it works is, every time we change some behavior in our software, a logical branch is created. This new behavior is only accessible if some specific configuration variable is set or, in certain cases, if the application context respects some rules.

There are many ways of implementing them, the simplest one would be defining a place to store some configuration variables and retrieve their values in your application, changing the flow only if the value is true. For instance, in a simple Python application, you can store the configurations as Environment Variables, and use them as the condition for an if clause.


# Environment
TOGGLE_FEATURE_ONE=1

import os

def is_feature_active(feature_name):
    env_var_value = os.environ.get('TOGGLE_' + feature_name.upper())
    return env_var_value == '1'

def my_function():
    if is_feature_active('feature_one'):
        do_something()
    else:
        do_something_else()

Yes, feature flags are that simple to use, but it doesn’t mean you can’t do big stuff with it. If well-used, they can bring a lot of benefits to your development flow. We'll discuss in this article what are these benefits, which precautions should we take when using feature flags, and what are the best practices to maintain a healthy codebase and increase your development team’s peace of mind.

This technique has already been very well discussed in great articles like this one from Martin Fowler (one of the signers in the Agile Manifest) and this other one written by Kent Beck (former technical coach on Facebook). Here, I’m going to approach the topic in a very practical way, with tips we've learned while using feature flags in Vinta's projects.

The great benefits we found by using feature flags

1. Improving team’s response time to bugs

The most obvious benefit is to be able to turn features on and off in production on the fly. This is great when you're launching new features. If users start experiencing some bug triggered by that feature, you can quickly turn it off and no more users will be affected. Then you can thoroughly fix the issue and reactivate the feature later, avoiding new complaints, and, even better, to avoid having your database getting more and more corrupted.

2. Making possible to sync code more frequently

Moreover, you can merge your code changes more often, even when the feature you were development is only partially complete. Since the old code will still be there with the flag off, the new feature wouldn’t impact other developers. But the greatest part is that they would have a better knowledge of where the code is going and already be doing their tasks adapted to the feature that is still incomplete.

By doing that, you’ll get less complex conflicts if there is more than one developer working on the same part of the code because it would be easier to always have a more up-to-date version of the code.

3. Having a more fluid feature launching flow

Another advantage of using feature flags happens when you’re testing features in sandbox environments. If your team has scarce resources and cannot have many sandboxes to test each staging feature in a dedicated server, you probably have more than one feature in the same staging git branch (or other source control manager’s). If one of these features is reproved on QA process/black box tests, you have to revert the broken feature or fix the issue before deploying to production. With feature flags, you can deploy the broken feature and keep it inactive until the fix is deployed as well.

This helps to create a more fluid development flow, without many reverts and rollbacks. This is good because the tools you use to perform the reverts/rollbacks can be tricky sometimes, especially if your feature involves data migrations.

In all Vinta’s projects, we have a staging environment where we validate if things are really working. The server simulates an environment that looks a lot like production, so we can find bugs we’d only in production. We have a git branch to sync with staging, and when things are tested in there, this branch is merged to production branch.

When our projects team started growing, we had a lot of features being merged to staging at the same time. That led to the problem we mentioned in this section: buggy features locking the whole deploy. We solved this by using feature flags because now we can turn the buggy features off, and they won’t cause any difficulties when deployed.

4. Validate your features with users the right way

In this article’s intro, it was mentioned that feature flags can use the application context instead of just the configuration variables. This is very useful, for instance, to decide whether a feature is active or not based on the authenticated user. This way, you can easily create user groups and make A/B tests and soft launches (enable features to your users gradually, instead of launching for everybody at once), making it possible to better evaluate your feature before releasing it definitely.

The partial feature release is also great to diminish the effect of bugs since new bugs are going to affect only a small part of your user base before the final release.

Using feature flags in the real world

As was said in the intro, the concept behind feature flags is simple, but there are a lot of ways to implement. Here we’ll discuss some of these implementations, their benefits and the concerns related to them. The main challenges about feature flags are these three: deciding which flag's storage method fits better with your application, how to always maintain the application consistent in any combination of the flags' states and maintaining the code quality/readability. Here we'll discuss all of these in detail.

Storing the flags

The storage method adds or reduces the complexity of your flags usage depending on your needs. We'll now talk about the four more relevant ways of storing flags.

Configuration files

This is only a suitable option if we're talking about an offline application. In this method, every time a flag needs to be changed, you have to change the application files, which usually means software updates or even a new deploy. But the good part is that it’s basic to implement since there are many tools to read configuration files in any language/framework you might be using.

Environment Variables

Earlier, we’ve explained the simplest way to store the flags, which are environment variables, but they are not always the best option.

Environment variables are great when restarting the app is not a concern because every time you change the values you'll have to retrieve them again for the whole application, which typically means to restart the servers. For instance, in a web application that can’t afford to be restarted, it would be better to store these flags in the database, in an in-memory data structure store like Redis, or in a third-party service.

Another concern of using environment variables is that you have to save your flags as strings. If you want to add more information than just a flag (boolean), like the user group names that have access to the feature you'll have to handle serialization/deserialization.

One last concern regarding environment variables is that they’re related to the environment your application is running on. If your application is distributed or has instances in more than one environment, you have to make sure the values are consistent between them.

Database

Storing flags on a remote database is a safer option for web and distributed applications because the consistency between the flags values between instances will be maintained. At the same time, restarts are not a concern, since most DBMS (database management systems) already handle this for you.

One thing you have to be aware of is that, since your application doesn't restart when you toggle flags, some requests and async operations may be running at the exact moment the flags' values are updated. This may lead to inconsistent application states. So, every time you're changing values, make sure you're handling all possible states and maybe only toggle big features when the access rate is lower, or when your async tasks/cron jobs aren't running. We're going to explore this problem in more details in the Asynchronous tasks and pending requests section.

The choice of which database to use may impact on the time your application takes to retrieve the flags' values. For instance, if you use Redis, which stores values in-memory, your application will probably retrieve flags' values faster than if you use a database platform like PostgreSQL. Moreover, Redis has a pub/sub system that can notify your application, when a flag changes. This way you’d not have to fetch flags’ values all the time.

Third party SDK

Furthermore, there are some services for managing feature flags like LaunchDarkly. The features they provide vary from service to service, but one of the best things is having a dashboard for managing your flags, making the activation/deactivation of features more accessible, even for a non-technical person.

Maintaining application consistency in active and inactive states

When you have two possible states for each feature (on and off), you have to make sure both of them work as expected. In some cases, you even need to ensure that intermediary states also work (like requests that had a flag value when they started running and a different one at the end).

Data migrations and backward compatibility

When you're developing a new feature, it’s very common that you need to add columns to existing tables or even create new tables. If there are only new tables in the feature, the work is easier because active and inactive states don't share the same table. In this case, your only concern is to maintain data consistency between the new tables and the old ones for the case where you must turn the feature off. You also have to make sure you don't delete data/tables you may need in the inactive state of the feature.

In case the feature’s active and inactive states share tables, you have to create workarounds to make sure the data is being stored correctly for both states. Occasionally, this means creating redundancy by having two or more columns meaning the same data but in a different format.

Asynchronous tasks and pending requests

When you toggle a feature on or off, many things may be happening in the background of your application. In some features, you can just ignore this and update flags without worrying about broken application/database states. But every so often, a simple feature flag toggle with the wrong timing may cause dangerous results for your application.

Imagine you are running a task that takes a long time in the background, fetching data from an external API that only sends you the same data once. This is very common for notifications systems: when you retrieve a notification, the message is destroyed from the original server queue/database. Let's suppose we've developed this new feature that saves every notification in the application database and emails the user for each one. This feature has a feature flag.

But suddenly, in the middle of the notifications retrieval, you notice your application is sending every notification to every user, not only to the ones that should receive them. This is a huge privacy problem, so you need to desperately turn the feature off to stop sending notification emails.

But then, with the feature off, you fetched some notifications, but did not store them in the database nor sent the emails. The notifications that were retrieved cannot be recovered anymore because they were deleted from the original source. This loss is irreversible.

In a case like this, you have to treat data carefully. Even if the store and email feature are disabled, you should still store the data you retrieve in the database, even if you don't show this data anywhere because when you fix the bug and reactivate the feature, no message will have been lost.

The same case could happen if the notifications retrieval was made in an HTTP request, so you also have to be careful about which requests are being made when toggling features.

One way to handle this is to consider every edge case on every affected part of the feature and run fix scripts to adjust database/application state after you toggle a feature on or off.

Another way to handle this in a more generic fashion is comparing the time the flag was activated and the time the request/async task started before considering the new value.

Automatic tests

When you're developing or updating a feature, you can usually delete some old tests that got outdated and do not describe the new application behavior correctly anymore. With feature flags, these tests still need to work, since the old behavior is still accessible when the feature is turned off. But how to differentiate the feature’s active state from the inactive state?

If you're using feature flags with Envvars or config files, you can simulate the flags’ states by using mocks. Python has a built-in module for mocks, and you can easily override some global configurations and test both states of each flag.

If you're using the database to store flags, you can change the value of each flag before running your usual tests, so the flag value matches the behavior you're testing.

Managing code quality and the size of the test suite

When you look at your code with only one feature flag, it's very readable. But imagine when you have dozens of features, each one with its flag, or even worse: nested features (consequently, nested flags). The code is going to look pretty bad.


if is_flag_active('my-first-feature'):
    do_something()
if is_flag_active('my-second-feature'):
    do_second_something()
else:
    do_second_something_else()
if is_flag_active('my-third-feature'):
    do_third_something()
if is_flag_active('my-forth-feature'):
    do_forth_something()
else:
    do_forth_something_else()
Indentation Hadouken
Almost something like this

To avoid this, there's one basic step to follow: remove old flags from the code as soon as they're validated. Or even better, develop your new features thinking you will have to remove the flag later (leaving only the active state and having to back to the old code in case you want to go back). Here we'll list some tips to make flags easier to read and to remove later.

1. Make your features modular

The goal here is to make it, so your features depend on your current app as little as possible. By doing this, you have few and clear contact points where the old behavior and the new meet. This way, it'll be easier to understand the logic since you don't have to know the whole application to understand what each part of the code is for. It'll also make it easier to find which parts of the code you'll have to modify to remove the flag once the feature is validated (or not).

Another reason to write modular features is that you may avoid bugs caused by an accidental partial removal of the flag's code since there'll be fewer and more separate lines of code to be removed.

A great way to start building a more modular application is by using proper design patterns. Strategy, chain of responsibility are good examples of design patterns that can help you maintain your code more modular.

2. Avoid long logical branches in your code

Use helper functions or classes when you have to branch your code according to the active and inactive state of a feature. If you write both states' logic in the same place, your code may become denser and less readable, so the best option here is not having logic directly in the if/else statements' scope (the ones that check the flag state).


# bad
if is_feature_active('my-feature'):
    do_something()
    do_something_else()
    do_something_more()
    do_log_something()
else:
    do_some_other_thing()
# …

# good
def do_feature_one_stuff():
    do_something()
    do_something_else()
    do_something_more()
    do_log_something()
    # ...

def do_feature_one_inactive_stuff():
    do_some_other_thing()
    # ...

if is_feature_active('my-feature'):
    do_feature_one_stuff()
else:
    do_feature_one_inactive_stuff()


3. Write different unit tests for the active and inactive state of features

If you don't do this, it'll be harder to separate which tests are related to which state when you're removing the flag later. Removing a flag must be as easy as possible since it's a very uninteresting task that doesn't aggregate anything to the product value (it's only a technical debt).


def test_my_function_do_something_with_flag_one_active():
    set_flag_active('flag_one')
    my_function()
    assert do_something

def test_my_function_do_something_else_with_flag_one_inactive():
    set_flag_inactive('flag_one')
    my_function()
    assert do_something_else


4. Schedule the flag's removal (and stick to the schedule)

It's important to have in mind when any feature flag is ready to be removed. A great way to make sure it's validated or not without much work is to give them some framed time to validate. This way you can create reminders to tell you when to remove the flags and plan the removals on your sprints.

Remember: removing old flags is really significant. Otherwise, your code readability will quickly become poor, and you’ll have a lot of accumulated work to improve it.

5. Measure your feature’s success

There are a lot of tools and services to store and see analytics of how your application is being used in more or fewer details, like Google Analytics, Facebook Pixel, Mixpanel and Hotjar. Measuring usage is critical when you're using feature flags, so you can properly validate if your new features are really improving user experience and metrics, and even find bugs and unhandled edge cases.

Don't reinvent the wheel, use the right tools

There are a lot of python packages that can help you implement feature flags. Each one has its specific implementation strategies, so you’ll have to choose which one is the best fit for your project.

Gutter (former Gargoyle)

Github link

This is a library maintained by Disqus. It's great for creating really complex context checks to decide whether a flag is active or not.

It is really flexible, but it's also a little bit challenging to use (there are a lot of concepts to understand). If your goal is to have the maximum flexibility, this is a good pick. If you're prioritizing simplicity, this may be too much.

Feature Ramp

Github link

This is a simple python API for creating and fetching flags using a Redis backend. It doesn't support context checking, only a boolean value for each feature.

Framework-specific tools

There are many tools for specific frameworks. The ones we’ve tested here were for Django and Flask, and these are our recommendations.

Django Waffle

Github link

Very flexible database-based feature flags API. It has a lot of helper functions to abstract its features, as well as some decorators and mixins to help you build your views (depending on whether you're using function views or class-based views).

Flask Feature Flags

Github link

This package has a basic but customizable API. By default, it stores the flags in a config file, but it gives you the options of using custom backends and storing the flags wherever you want.

Conclusion

We've had a great experience with feature flags here at Vinta so far. We took advantage of all the benefits listed in this article, but the path has not always been easy.

We’ve learned and continue to learn a lot about how to have a healthy development flow with feature flags, preserving code quality and safety for the developers.

These were some of the tips we found to be useful in our experience. If you have something else worth mentioning, please leave a comment! We'd love to hear your input!

Special thanks to the people who helped me to write this post: @Luca Bezerra, @Lais Varejão, @Igor Calabria, @Pedro Torres and @Thiago Diniz.