article

Testing the diff

Good unit tests laser focus on the smallest possible scope and are  crafted to isolate the functionality from as much external interference  as possible. The way to do this is to write tests that self ensure they  are testing what we expect them to be testing.

Let's see this in practice. Suppose the following test for an endpoint filtering:

def test_filter_by_country():
    client = TestClient()

    Store.create(country="AUS")
    
    response = client.get("/stores/?country=AUS")
    assert len(response.data) == 1

At first sight, the above test seems to be correctly checking if our  endpoint is filtering by country. But as we will see, it's so fragile to  the point that it's almost useless. Here is the code for the store  listing endpoint we are testing:

def list_stores(request):
    stores = Store.get_all()

    country_filter = request.querystring.get("country", None)
    if country_filter:
            stores = stores.filter(country=country_filter)

    return stores.to_json()

Notice how the test we wrote will still pass even if we completely delete the filtering part of the endpoint as such:

def list_stores(request):
    stores = Store.get_all()
    return stores.to_json()

This might look like an unreasonable thing to happen, but this is  precisely the kind of thing we, developers, are constantly doing due to  changes in business rules, and it could even happen by accident. It's  tough to track down which tests need updating when we are making changes  [especially in big codebases], and it's easy to remove something we  assume is not being used [especially in legacy codebases].

Alright, how do we improve on this then? One easy way would be to add more fixtures to the test:

def test_filter_by_country():
    client = TestClient()

    Store.create(country="AUS")
    Store.create(country="BR")
    
    response = client.get("/stores/?country=AUS")
    assert len(response.data) == 1

Although that seems to solve the issue, we still cannot be sure that  what is causing the BR store not to show in the results is the action of  the country=AUS filter or some other business logic we are not aware of. It could still  be the case that we are not capturing that the filter is actually  filtering! The following code would still make the test to pass:

def list_stores(request):
    stores = Store.get_all().remove_LATAM()
    return stores.to_json()

Ugh! Again, this looks a bit too specific, but we are constantly making  decisions in our day-to-day that cause fragile tests to break in similar  ways.

So what can we do to make tests more reliable? Make them SPECIFIC and ISOLATE the functionality they are covering!

The best technique I've encountered to help with this is to test the diff.

Testing the diff means that our tests will pass or fail precisely  because of an intended and explicit change. Here is how this would work  in our example:

def test_filter_by_country():
    client = TestClient()

    store = Store.create(country="BR")
    
    response = client.get("/stores/?country=AUS")
    assert len(response.data) == 0

    store.update(country="AUS") 
    
    response = client.get("/stores/?country=AUS")
    assert len(response.data) == 1

The new setup gives us confidence of 2 very important assertions about the functionality we are writing:

  • The filter includes the AUS items;
  • The only thing that is causing the filter to return the item is the fact that the store country is AUS;

Notice we are still not ensuring that the filter is what is causing the  BR store to be left out of the results. So we can improve this even  further by doing:

def test_filter_by_country():
    client = TestClient()

    Store.create(country="AUS")
    Store.create(country="BR")
    
    response = client.get("/stores/") # no filter
    assert len(response.data) == 2

    response = client.get("/stores/?country=AUS") # country filter
    assert len(response.data) == 1
    assert response.data[0]["country"] == "AUS"

We are now sure our tests are testing precisely the filter  functionality. Changes in unrelated features can still break this test,  but they will break the first assertion, and we will probably be able to  fix it by updating the Store objects creation lines (which brings us to the topic on the importance of having good fixtures, but that's subject  to another post 😉)

So, when can we use the "testing the diff" technique? There are many  situations where testing the diff can give us more reliable tests, but  to leave you with some food for thought, try to think how you would  write the tests and the benefits of testing the diff when testing  permissions and when doing performance tests.

Filipe Ximenes

Founder & Chief Technology Officer