GameDev TDD or Rabbit Hell
- Transfer
TDD is rarely used in game development. It's usually easier to hire a tester than to set aside a developer to write tests - this saves both resources and time. Therefore, each successful use of TDD becomes more interesting. Under the cut, the translation of the material, where this development technique was used to create the movement of characters in the game ElemenTerra.
Test-driven development or TDD (development through testing) is a software development technique in which the whole process is divided into many small cycles. Unit tests are written, then code is written that passes these tests, and then refactoring is done. And the algorithm repeats.
Suppose we write a function that adds two numbers. In a normal workflow, we would just write it. But to use TDD, you need to start by creating a placeholder function and unit tests:
At first, our unit tests will not work, because the placeholder function returns -1 for each input. Now we can execute add correctly to return a + b . Tests will be passed. This may seem like a workaround, but there are several advantages:
If you mistakenly add add as a - b , our tests will not work, and we will immediately learn how to fix the function. Without tests, we can not catch this error and see a non-standard reaction that will take time to debug.
We can continue the tests and run them at any time while writing the code. This means that if another programmer accidentally changes add , he will immediately recognize the error - the tests will fail again.
There are two problems with TDD in game dev. Firstly, many game functions have subjective goals that cannot be measured. And secondly, it’s hard to write tests covering all the possibilities of the space of worlds that are full of complex interacting objects. Developers who want their characters' movements to “look good” or physical simulations “not to look jerky” will find it difficult to express these metrics as deterministic “passed / not passed” conditions.
However, the TDD technique is applicable to complex and subjective features - for example, character movement. And in the game ElemenTerra we did it.
Before starting to practice, I want to distinguish between an automatic unit test and a traditional “debug level”. Creating hidden locations with artificial conditions is a common thing in gamedev. This allows programmers and QAs to monitor individual events.
Secret debugging level in The Legend of Zelda: The Wind Waker
ElemenTerra has a lot of such levels: a level full of problematic geometry for a player’s character, levels with special user interfaces that trigger certain game states and others.
Like unit tests, these debugging levels can be used to reproduce and diagnose errors. But in some ways they differ:
Unit tests divide systems into parts and evaluate each individually, while debugging levels conduct tests in a more holistic way. After finding the error at the debugging level, developers may still need to manually search for the error point.
Unit tests are automated and must each time give deterministic results, while many debug levels are “controlled” by the player. This makes a difference in sessions.
But this does not mean that unit tests are better than debugging levels. The latter are often more practical. However, unit testing can even be used on systems where it has not traditionally been present.
At ElemenTerra, players use the mystical forces of nature to save creatures affected by a space storm. One such force is the ability to pave the way that leads creatures to food and shelter. Since these paths are dynamic grids created by players, the creature’s movement must deal with unusual geometric cases and an arbitrarily complex terrain.
Character movement is one of those complex systems where "everything affects everything else." If you've ever done this, you know that when writing new code, it’s very easy to break existing functionality. Do you need rabbits to climb small ledges? Okay, but now they are twitching, climbing the slopes. Do you want the lizard paths not to intersect? It worked, but now their typical behavior is ruined.
As the person responsible for the AI and most of the gameplay code, I knew that I did not have time for surprise errors. I wanted to immediately notice the regression, so development using TDD seemed like a good option to me.
The next step was the creation of a system in which I could easily identify each movement case in the form of a simulated pass / fail test:
This "rabbit hell" consists of 18 isolated corridors. Each with a creature and its own route, designed to move only if a certain movement function works. Tests are considered successful if the rabbit is able to move for an infinitely long time without getting stuck. Otherwise, unsuccessful. Note that we only test the body of creatures (pawn in Unreal terms), not artificial intelligence. In ElemenTerra, creatures can eat, sleep, and react to the world, but in "rabbit hell" their only instruction is to run between two points.
Here are some examples of such tests:
1, 2, 3: Free movement, static obstacles and dynamic obstacles
8 and 9: Uniform slopes and rough terrain
10: Disappearing floor
13: Reproduction of a bug in which the creatures endlessly revolved around nearby targets
14 and 15: The ability to move along flat and complex ledges
Let's talk about the similarities and differences between my implementation and the “clean” TDD.
My system was similar to TDD in this:
And differed in this:
Using TDD to move an ElemenTerra creature was a huge plus, but my approach had several limitations:
This Mossmork requires a bit more space than a rabbit.
Developers may spend too much effort on unit test levels that the player will never appreciate. I do not deny it, I myself received a lot of pleasure from creating the “rabbit hell”. Such internal functions can be time-consuming and jeopardize more important milestones. To prevent this from happening, carefully study where and when to use unit tests. Below I have highlighted several criteria that justify TDD for the movement of an ElemenTerra creature.
1. Will it take a long time to manually complete the test tasks?
Before you spend time on automated testing, you need to check whether we can evaluate the function using conventional game controls. If you want to make sure that your keys unlock the doors, spawn the key and open the door for them. Creating unit tests for this function would be a waste of time - manual testing takes only a few seconds.
2. Is it difficult to create test cases manually?
Automated unit tests are justified when there are known and difficult to reproduce cases. Test No. 7 of “rabbit hell” checks how they walk along the ledges - something that AI usually tries hard to avoid. Such a situation may be difficult or impossible to reproduce using game controls, and tests are easy.
3. Do you know that the desired results will not change?
The game design is entirely based on iterations, so the goals of features can change as your game changes. Even small changes can invalidate the metrics by which you evaluate your features, and therefore any unit tests. If the behavior of the creatures during food, sleep and interaction with the player changed several times, the transition from point A to point B remained unchanged. Therefore, the movement code and its unit tests remained relevant throughout the development.
4. Are regressions likely to go unnoticed?
Did you have a situation when you complete one of the last tasks before sending the game, and suddenly you find an error that breaks the rules? And in the function that you finished many years ago. Games are gigantic interconnected systems, and therefore it is natural that adding a new function B can lead to the failure of the old function A.
It is not so bad when a broken function is used everywhere (for example, a jump) - you should immediately notice a breakdown in the mechanics. Errors discovered in a later development can disrupt the schedule, and after launch can harm the gameplay.
5. The worst that can happen when using tests and without them?
Creating tests is one form of risk management. Imagine that you decide whether to buy vehicle insurance. You need to answer three questions:
For TDD, we can imagine monthly contributions in the form of production costs for servicing our unit tests, the probability of damage to the car in the form of the probability of a bug, and the cost of a complete replacement of the car as the worst case scenario for a regression error.
If it takes a lot of time to create a feature test, it is simple and unlikely to be changed (or it can be dealt with if it breaks in the later development), then unit tests can cause more problems than good. If the tests are easy to make, the function is unstable and interconnected (or its errors will take a lot of time), then the tests will help.
Unit tests can be a great addition to finding and eliminating errors, but they do not replace the need for professional quality control in large-scale games. QA is an art that requires creativity, subjective judgment and excellent technical communication.
Test-driven development or TDD (development through testing) is a software development technique in which the whole process is divided into many small cycles. Unit tests are written, then code is written that passes these tests, and then refactoring is done. And the algorithm repeats.
TDD Basics
Suppose we write a function that adds two numbers. In a normal workflow, we would just write it. But to use TDD, you need to start by creating a placeholder function and unit tests:
// Placeholder-функция, которая дают неверные результаты:
int add(int a, int b){
return -1;
}
// Unit-тесты, которые выдают ошибку, если add не даст правильных результатов:
void runTests(){
if (add(1, 1) is not equal to 2)
throw error;
if (add(2, 2) is not equal to 4)
throw error;
}
At first, our unit tests will not work, because the placeholder function returns -1 for each input. Now we can execute add correctly to return a + b . Tests will be passed. This may seem like a workaround, but there are several advantages:
If you mistakenly add add as a - b , our tests will not work, and we will immediately learn how to fix the function. Without tests, we can not catch this error and see a non-standard reaction that will take time to debug.
We can continue the tests and run them at any time while writing the code. This means that if another programmer accidentally changes add , he will immediately recognize the error - the tests will fail again.
TDD in game dev
There are two problems with TDD in game dev. Firstly, many game functions have subjective goals that cannot be measured. And secondly, it’s hard to write tests covering all the possibilities of the space of worlds that are full of complex interacting objects. Developers who want their characters' movements to “look good” or physical simulations “not to look jerky” will find it difficult to express these metrics as deterministic “passed / not passed” conditions.
However, the TDD technique is applicable to complex and subjective features - for example, character movement. And in the game ElemenTerra we did it.
Unit tests against debug levels
Before starting to practice, I want to distinguish between an automatic unit test and a traditional “debug level”. Creating hidden locations with artificial conditions is a common thing in gamedev. This allows programmers and QAs to monitor individual events.
Secret debugging level in The Legend of Zelda: The Wind Waker
ElemenTerra has a lot of such levels: a level full of problematic geometry for a player’s character, levels with special user interfaces that trigger certain game states and others.
Like unit tests, these debugging levels can be used to reproduce and diagnose errors. But in some ways they differ:
Unit tests divide systems into parts and evaluate each individually, while debugging levels conduct tests in a more holistic way. After finding the error at the debugging level, developers may still need to manually search for the error point.
Unit tests are automated and must each time give deterministic results, while many debug levels are “controlled” by the player. This makes a difference in sessions.
But this does not mean that unit tests are better than debugging levels. The latter are often more practical. However, unit testing can even be used on systems where it has not traditionally been present.
Welcome to rabbit hell
At ElemenTerra, players use the mystical forces of nature to save creatures affected by a space storm. One such force is the ability to pave the way that leads creatures to food and shelter. Since these paths are dynamic grids created by players, the creature’s movement must deal with unusual geometric cases and an arbitrarily complex terrain.
Character movement is one of those complex systems where "everything affects everything else." If you've ever done this, you know that when writing new code, it’s very easy to break existing functionality. Do you need rabbits to climb small ledges? Okay, but now they are twitching, climbing the slopes. Do you want the lizard paths not to intersect? It worked, but now their typical behavior is ruined.
As the person responsible for the AI and most of the gameplay code, I knew that I did not have time for surprise errors. I wanted to immediately notice the regression, so development using TDD seemed like a good option to me.
The next step was the creation of a system in which I could easily identify each movement case in the form of a simulated pass / fail test:
This "rabbit hell" consists of 18 isolated corridors. Each with a creature and its own route, designed to move only if a certain movement function works. Tests are considered successful if the rabbit is able to move for an infinitely long time without getting stuck. Otherwise, unsuccessful. Note that we only test the body of creatures (pawn in Unreal terms), not artificial intelligence. In ElemenTerra, creatures can eat, sleep, and react to the world, but in "rabbit hell" their only instruction is to run between two points.
Here are some examples of such tests:
1, 2, 3: Free movement, static obstacles and dynamic obstacles
8 and 9: Uniform slopes and rough terrain
10: Disappearing floor
13: Reproduction of a bug in which the creatures endlessly revolved around nearby targets
14 and 15: The ability to move along flat and complex ledges
Let's talk about the similarities and differences between my implementation and the “clean” TDD.
My system was similar to TDD in this:
- I started working on functions by creating tests, and then wrote the code needed to run them.
- I continued to run old tests, adding new features.
- Each test measured exactly one part of the system, which allowed me to quickly find problems.
- Tests were automated and did not require player input.
And differed in this:
- When evaluating the tests, there was an element of subjectivity. While the real mistakes of moving (the character did not get from A to B) could be detected programmatically. That is, for example, a skew position, problems of synchronization of animation and twitching movement required a human assessment.
- Tests were not completely deterministic. Random factors, such as fluctuations in frame rate, caused small deviations. But in general, creatures usually follow the same paths and have the same success / failure between sessions.
Limitations
Using TDD to move an ElemenTerra creature was a huge plus, but my approach had several limitations:
- Unit tests evaluated each feature of the movement individually, so errors with combinations of several features were not considered. Sometimes it was necessary to supplement unit tests with traditional debugging levels.
- ElemenTerra has four kinds of creatures, but tests contain only rabbits. This is a feature of our production schedule (the other three types were added much later in development). Fortunately, all four have the same mobility, but Mossmork’s large body caused several problems. Next time, I would have the tests dynamically spawn the selected species instead of using pre-placed rabbits.
This Mossmork requires a bit more space than a rabbit.
TDD is your choice?
Developers may spend too much effort on unit test levels that the player will never appreciate. I do not deny it, I myself received a lot of pleasure from creating the “rabbit hell”. Such internal functions can be time-consuming and jeopardize more important milestones. To prevent this from happening, carefully study where and when to use unit tests. Below I have highlighted several criteria that justify TDD for the movement of an ElemenTerra creature.
1. Will it take a long time to manually complete the test tasks?
Before you spend time on automated testing, you need to check whether we can evaluate the function using conventional game controls. If you want to make sure that your keys unlock the doors, spawn the key and open the door for them. Creating unit tests for this function would be a waste of time - manual testing takes only a few seconds.
2. Is it difficult to create test cases manually?
Automated unit tests are justified when there are known and difficult to reproduce cases. Test No. 7 of “rabbit hell” checks how they walk along the ledges - something that AI usually tries hard to avoid. Such a situation may be difficult or impossible to reproduce using game controls, and tests are easy.
3. Do you know that the desired results will not change?
The game design is entirely based on iterations, so the goals of features can change as your game changes. Even small changes can invalidate the metrics by which you evaluate your features, and therefore any unit tests. If the behavior of the creatures during food, sleep and interaction with the player changed several times, the transition from point A to point B remained unchanged. Therefore, the movement code and its unit tests remained relevant throughout the development.
4. Are regressions likely to go unnoticed?
Did you have a situation when you complete one of the last tasks before sending the game, and suddenly you find an error that breaks the rules? And in the function that you finished many years ago. Games are gigantic interconnected systems, and therefore it is natural that adding a new function B can lead to the failure of the old function A.
It is not so bad when a broken function is used everywhere (for example, a jump) - you should immediately notice a breakdown in the mechanics. Errors discovered in a later development can disrupt the schedule, and after launch can harm the gameplay.
5. The worst that can happen when using tests and without them?
Creating tests is one form of risk management. Imagine that you decide whether to buy vehicle insurance. You need to answer three questions:
- How much are monthly insurance premiums?
- How likely is the car to be damaged?
- How expensive would the worst case scenario be if you were not insured?
For TDD, we can imagine monthly contributions in the form of production costs for servicing our unit tests, the probability of damage to the car in the form of the probability of a bug, and the cost of a complete replacement of the car as the worst case scenario for a regression error.
If it takes a lot of time to create a feature test, it is simple and unlikely to be changed (or it can be dealt with if it breaks in the later development), then unit tests can cause more problems than good. If the tests are easy to make, the function is unstable and interconnected (or its errors will take a lot of time), then the tests will help.
Automation Limits
Unit tests can be a great addition to finding and eliminating errors, but they do not replace the need for professional quality control in large-scale games. QA is an art that requires creativity, subjective judgment and excellent technical communication.