Tautological Tests

Original author: Roy Williams
  • Transfer


Hello! My name is Artyom, and most of my working time I write complex self-tests on Selenium and Cucumber / Calabash. Honestly, quite often I face a difficult choice: to write a test that tests a specific implementation of functionality (because it is easier) or a test that tests functionality (because it is more correct, but much more complicated)? Recently I came across a good article that implementation tests are “tautological” tests. And, having read it, I have been rewriting some tests in a different vein for almost a week now. I hope she encourages you to think too.


Everyone knows that tests are crucial for quickly creating quality software. But, like everything else in our lives, if used improperly, they can do more harm than good. Consider the following simple function and test. In this case, the author wants to protect the tests from external dependencies, so stubs are used.


import hashlib
from typing import List
from unittest.mock import patch
def get_key(key: str, values: List[str]) -> str:
    md5_hash = hashlib.md5(key)
    for value in values:
        md5_hash.update(value)
    return f'{key}:{md5_hash.hexdigest()}'
@patch('hashlib.md5')
def test_hash_values(mock_md5):
    mock_md5.return_value.hexdigest.return_value = 'world'
    assert get_key('hello', ['world']) == 'hello:world'
    mock_md5.assert_called_once_with('hello')
    mock_md5.return_value.update.assert_called_once_with('world')
    mock_md5.return_value.hexdigest.assert_called()

It looks great! Four statements are fully tested to make sure the code is working as expected. Tests even pass!


$ python3.6 -m pytest test_simple.py
========= test session starts =========
itemstest_simple.py .
======= 1 passed in 0.03 seconds ======

Of course, the problem is that the code is wrong. md5 accepts only bytesand not str(in this post explains how to change Python 3 bytesand str). The test scenario does not play a big role; only string formatting was tested here, which gives us a false sense of security: it seems to us that the code is written correctly, and we even proved it with the help of test scripts!


Fortunately, mypy catches these problems:


$ mypy test_simple.py
test_simple.py:6: error: Argument 1 to “md5” has incompatible type “str”; expected “Union[bytes, bytearray, memoryview]”
test_simple.py:8: error: Argument 1 to “update” of “_Hash” has incompatible type “str”; expected “Union[bytes, bytearray, memoryview]”

Great, we fixed our code to first encode strings to bytes:


def get_key(key: str, values: List[str]) -> str:
    md5_hash = hashlib.md5(key.encode())
    for value in values:
        md5_hash.update(value.encode())
    return f'{key}:{md5_hash.hexdigest()}'

Now the code works, but the problems remain. Let's say someone went through our code and simplified it to just a few lines:


def get_key(key: str, values: List[str]) -> str:
    hash_value = hashlib.md5(f"{key}{''.join(values)}".encode()).hexdigest()
    return f'{key}:{hash_value}'

Functionally obtained identical to the source code. For the same input, it will always return the same result. But even in this case, testing fails:


E AssertionError: Expected call: md5(b’hello’)
E Actual call: md5(b’helloworld’)

Obviously there is some problem with this simple test. Here, at the same time, there is an error of the first kind (the test crashes even if the code is correct) and an error of the second kind (the test does not crash when the code is incorrect). In an ideal world, tests will fail if (and only if) the code contains an error. And in an even more ideal world, when passing the tests, you can be completely sure of the correctness of the code. And although both ideals are unattainable, it is worth striving for them.


The tests described above, I call "tautological." They confirm the correctness of the code, ensuring that it runs the way it is written, which, of course, assumes that it is written correctly.



I believe that tautological tests are a definite minus for your code. For several reasons:


  1. Tautological tests give engineers a false sense that their code is correct. They can look at high code coverage and enjoy their projects. Other people using the same code base will confidently push the changes while the tests pass, although these tests do not actually test anything.
  2. Tautological tests actually “freeze” the implementation, and do not verify that the code behaves as intended. When changing any aspects of the implementation, it is necessary to reflect this by changing the tests, and not changing the tests when the expected output changes. This encourages engineers to adjust the tests in case of failures during their run, and not to find out why the tests fail. If this happens, then the tests become a burden, their original meaning is lost as a tool to prevent bugs from getting into production.
  3. Static analysis tools can detect egregious errors in your code, such as typos that could have been caught by tautological tests anyway. Static analysis tools have improved significantly over the past five years, especially in dynamic languages. For example, Mypy in Python, Hack in PHP, or TypeScript in JavaScript. All of them are often better suited for catching typos, while being more valuable to engineers because they make the code more understandable and easy to navigate.

In other words, tautological tests often miss real problems, stimulating the bad habit of blindly correcting tests, and the benefits of them do not pay back the effort to support them.


Let's rewrite the test to check the output:


def test_hash_values(mock_md5):
    expected_value = 'hello:fc5e038d38a57032085441e7fe7010b0'
    assert get_key('hello', ['world']) == expected_value

Now the details are not important for the test get_key, it will fail only if it get_keyreturns the wrong value. I can optionally change the internals get_keywithout updating the tests at the same time (until I change the public behavior). In this case, the test is short and easy to understand.


Although this is a contrived example, it is easy to find places in real code where, for the sake of increasing the coverage of the code, it is assumed that the output of external services meets the expectations of the implementation.


How to identify tautological tests


  1. Tests that fail when updated are much more likely to be tested code. Each time we pay the price for code coverage. If this price exceeds the benefit received from the tests, then it is likely that the tests are too closely related to the implementation. A related problem: small changes in the test code require updating a lot more tests.
  2. Test code cannot be edited without reconciliation with implementation. In this case, there is a good chance that you received a tautological test. In Testing on the Toilet: Don't Overuse Mocks, you'll find a very familiar example. You can recreate the implementation itself based on this test:


    public void testCreditCardIsCharged() {
        paymentProcessor = new PaymentProcessor(mockCreditCardServer);
        when(mockCreditCardServer.isServerAvailable()).thenReturn(true);
        when(mockCreditCardServer.beginTransaction()).thenReturn(mockTransactionManager);
        when(mockTransactionManager.getTransaction()).thenReturn(transaction);
        when(mockCreditCardServer.pay(transaction, creditCard, 500)).thenReturn(mockPayment);
        when(mockPayment.isOverMaxBalance()).thenReturn(false);
        paymentProcessor.processPayment(creditCard, Money.dollars(500));
        verify(mockCreditCardServer).pay(transaction, creditCard, 500);
    }


How to fix tautological tests


  1. Separate I / O from logic. It is because of I / O that engineers most often turn to stubs. Yes, input / output is extremely important, without it we could only scroll through the processor cycles and warm the air. But it is better to transfer input / output to the periphery of your code, rather than mixing it with logic. The Python community's Sans-I / O working group has developed excellent documentation on this subject, and Corey Benfield spoke well about it in his presentation of the Building Protocol Libraries The Right Way at PyCon 2016.
  2. Avoid stubs in objects in memory. To use dependencies that are entirely in memory as stubs, very good reasons are needed. Perhaps the underlying function is non-deterministic or takes too long. The use of real objects increases the value of tests by checking more interactions within the test scenario. But even in this case, there should be tests to make sure that the code uses these dependencies correctly (like a test that checks that the output is in the expected range). Below is an example in which we check that our code works if it randintreturns a certain value, and that we correctly call it randint.


    import random
    from unittest.mock import patch
    def get_thing():
        return random.randint(0, 10)
    @patch('random.randint')
    def test_random_mock(mock_randint):
        mock_randint.return_value = 3
        assert get_thing() == 3
    def test_random_real():
        assert 0 <= get_thing() < 10

  3. Use auxiliary data. If a stub dependency is used as an external service, create a set of fake data or use a stub server to provide auxiliary data. Centralization of the fake implementation allows you to carefully emulate the behavior of the real implementation and minimize the amount of test changes during implementation changes.
  4. Do not be afraid to leave part of the code uncovered! If you choose between good code testing and no tests, the answer is obvious: test well. But when choosing between a tautological test and a lack of test, everything is not so obvious. I hope I convinced you that tautological tests are evil. If you leave part of the code uncovered, it will become a kind of indicator for other developers of the current state of affairs - they will be able to exercise caution when modifying this part of the code. Or, more preferably, use the above techniques to write suitable tests.


It is better to leave the line of code uncovered than to create the illusion that it is well tested.


Also pay attention to tautological tests when revising someone else's code. Ask yourself what this test actually checks, and not just whether it covers any lines of code.


Remember, tautological tests are bad because they are not good.


What to read on the topic



Also popular now: