Testing the diff

Good unit tests laser focus on the smallest possible scope and are crafted to isolate the functionality from as much external interference as possible. The way to do this is to write tests that self ensure they are testing what we expect them to be testing.Let's see this in practice. Suppose the following test for an endpoint filtering:<pre><code>def test_filter_by_country(): client = TestClient() Store.create(country="AUS") response = client.get("/stores/?country=AUS") assert len(response.data) == 1 </code></pre> At first sight, the above test seems to be correctly checking if our endpoint is filtering by country. But as we will see, it's so fragile to the point that it's almost useless. Here is the code for the store listing endpoint we are testing:<pre><code>def list_stores(request): stores = Store.get_all() country_filter = request.querystring.get("country", None) if country_filter: stores = stores.filter(country=country_filter) return stores.to_json() </code></pre> Notice how the test we wrote will still pass even if we completely delete the filtering part of the endpoint as such:<pre><code>def list_stores(request): stores = Store.get_all() return stores.to_json() </code></pre> This might look like an unreasonable thing to happen, but this is precisely the kind of thing we, developers, are constantly doing due to changes in business rules, and it could even happen by accident. It's tough to track down which tests need updating when we are making changes [especially in big codebases], and it's easy to remove something we assume is not being used [especially in legacy codebases].Alright, how do we improve on this then? One easy way would be to add more fixtures to the test:<pre><code>def test_filter_by_country(): client = TestClient() Store.create(country="AUS") Store.create(country="BR") response = client.get("/stores/?country=AUS") assert len(response.data) == 1 </code></pre> Although that seems to solve the issue, we still cannot be sure that what is causing the BR store not to show in the results is the action of the <code>country=AUS</code> filter or some other business logic we are not aware of. It could still be the case that we are not capturing that the filter is actually filtering! The following code would still make the test to pass:<pre><code>def list_stores(request): stores = Store.get_all().remove_LATAM() return stores.to_json() </code></pre> Ugh! Again, this looks a bit too specific, but we are constantly making decisions in our day-to-day that cause fragile tests to break in similar ways.So what can we do to make tests more reliable? Make them SPECIFIC and ISOLATE the functionality they are covering!The best technique I've encountered to help with this is to test the diff.Testing the diff means that our tests will pass or fail precisely because of an intended and explicit change. Here is how this would work in our example:<pre><code>def test_filter_by_country(): client = TestClient() store = Store.create(country="BR") response = client.get("/stores/?country=AUS") assert len(response.data) == 0 store.update(country="AUS") response = client.get("/stores/?country=AUS") assert len(response.data) == 1 </code></pre> The new setup gives us confidence of 2 very important assertions about the functionality we are writing:<ul><li>The filter includes the AUS items;</li><li>The only thing that is causing the filter to return the item is the fact that the store country is AUS;</li></ul>Notice we are still not ensuring that the filter is what is causing the BR store to be left out of the results. So we can improve this even further by doing:<pre><code>def test_filter_by_country(): client = TestClient() Store.create(country="AUS") Store.create(country="BR") response = client.get("/stores/") # no filter assert len(response.data) == 2 response = client.get("/stores/?country=AUS") # country filter assert len(response.data) == 1 assert response.data[0]["country"] == "AUS" </code></pre> We are now sure our tests are testing precisely the filter functionality. Changes in unrelated features can still break this test, but they will break the first assertion, and we will probably be able to fix it by updating the Store objects creation lines (which brings us to the topic on the importance of having good fixtures, but that's subject to another post 😉)So, when can we use the "testing the diff" technique? There are many situations where testing the diff can give us more reliable tests, but to leave you with some food for thought, try to think how you would write the tests and the benefits of testing the diff when testing permissions and when doing performance tests.

Testing the diff

Related articles

Building secure patient-provider communication: a Medplum Chat App example

Dealing with resource-consuming tasks on Celery

PyGotham 2018 talks: highlights and insights