How much are unit tests?

Now, at the peak of the economic cycle in such a hot industry as software development, it is not customary to count money. Often this process, in principle, is positioned as a creative activity, where there is no need to justify anything, and the artist knows better what and how to write. In particular, there is a lot of controversy on the topic of unit tests and TDD, but, unfortunately, they all slip into unproven statements and emotional attacks, confirmed by proofs for successfully selected articles and books by methodologists who earn on consultations and sales of trainings, which, in their in turn, they contain absolutely no statistics or calculations, or, on the contrary, to sweeping accusations of smushyhlebstvo and other sins of youth.
Unlike similar empty arguments, this article will give you not only food for thought, but also a methodology for assessing the economic feasibility of introducing unit tests on a specific project. I’ll immediately emphasize that, like any assessment, our assessment for the unit test implementation project will be based on assumptions about the future of the product, team and various indicators, which can only be assessed subjectively. Nevertheless, the situation when a programmer gives an expert assessment of indicators that are at least somehow related to his field of activity is much better than directly asking him whether it is profitable for the company to use unit tests or not. In the end, programmers are usually not inclined to even think about basic financial indicators, but to evaluate the time,
What is the cost
First of all, if we are going to calculate the cost of anything, we must first at least in general terms understand what value is. When we buy a sausage loaf, this question does not arise, since a price tag hangs under it. But in the case of the project, we are not dealing with a one-time payment, but with a cash flow that lasts a long time, and we invest it first, and then pay us. In order to correctly evaluate it, you need to consider that the money received tomorrow will cost less than today. If the sausage was sold by subscription, it would not be difficult to do, the cost of the subscription could be calculated by discounting future payments for it by the amount of inflation. However, in the case of the project, we need to take into account that the company, investing in it, expects not only to compensate for its costs, but also to earn, yes, to cover your risks. There are a huge number of risks, but in the end it all comes down to the fact that the return on money invested in the implementation of the project is not guaranteed and poorly predicted. Money can be taken to a bank or borrowed to some tough borrower and guaranteed to pay interest on a regular basis, but you cannot invest in a project so that you end up with a predictable payout flow on schedule. Therefore, from the point of view of the investor, in this case, the company, the cash flow discounted at the required rate of return generated by the project, also called net present value, should be positive: Money can be taken to a bank or borrowed to some tough borrower and guaranteed to pay interest on a regular basis, but you cannot invest in a project so that you end up with a predictable payout flow on schedule. Therefore, from the point of view of the investor, in this case, the company, the cash flow discounted at the required rate of return generated by the project, also called net present value, should be positive: Money can be taken to a bank or borrowed to some tough borrower and guaranteed to pay interest on a regular basis, but you cannot invest in a project so that you end up with a predictable payout flow on schedule. Therefore, from the point of view of the investor, in this case, the company, the cash flow discounted at the required rate of return generated by the project, also called net present value, should be positive:
0 < N P V = F C F F / W A C C + F C F F / W A C C 2 + F C F F / W A C C 3 ...
In order to calculate it, we need to know how much money the project will bring or eat in the first, second, third year, etc., as well as the discount rate, which should not only be more profitable than at the bank, but also cover the risks .
In fact, this is a somewhat simplified formula. Strictly speaking, the rate will change from year to year, so WACC1 * WACC2 * WACC3, etc., should be used in the denominator, but in practice even professional appraisers neglect this, because By virtue of the WACC calculation methodology, the market’s expectations regarding future rates have already been incorporated into today's rates and it is unproductive to make assumptions about this.
There are different types of cash flows, but I took the most convenient cash flow for our purposes for the company, which takes into account not only the money owed to the owners, but also to the creditors. Of course, most IT companies do not have noticeable debts simply because no one lends to them without collateral, and they have nothing to pledge, but there are still exceptions, for example, this approach can be convenient when evaluating a project in the in-house development of a loaned manufacturing company . The second reason why FCFF is of particular interest to us is the simplicity of its calculation, FCFF is just operating profit less taxes, net capital costs and changes in working capital.
Since FCFF is a cash flow for both owners and lenders at the same time, it is discounted at a weighted rate of cost of capital, both own and borrowed.
In large companies, the cost of capital is monitored by the financial department, so you can just ask it, but for the general case, we still need a formula for calculating WACC:
W A C C = R e ∗ P / E V + R d ∗ ( 1 - P / E V )
Here Re is the cost of equity, Rd is the cost of borrowed capital (that is, the effective rate on the company's debts), P is the market value of equity, EV is the total cost of the enterprise (EV = P + D, where D is debt).
Next we need to determine Re, for this there are different models, but the easiest way is to take the CAPM model, where Re = Rb + β * Premium, where Rb is the risk-free rate, Premium is the premium on the return on investment in equity, and not in borrowed, and β is a risk factor that shows how much more risky our project is with respect to the business of a certain average company.
How quality is ensured and what unit tests are
Now we need to decide what unit tests are. Oddly enough, many people, even close to development, often call any automated tests unit tests, but this, of course, is not so.
Testing is divided into functional and non-functional. Non-functional includes things that are not directly related to the functionality of the software, for example, load testing or tests related to security. Functional, however, just means checking compliance with requirements and the absence of errors in their implementation, and it will be discussed about him.
The first thing that needs to be done to ensure quality is to take the control function from the developers and hire the person who will be responsible for it. So a tester appears in the team who is engaged in manual testing. Not a single serious project is simply unthinkable without manual testing; it is a foundation that is vital for the project and the vast majority of problems that will be discovered and corrected in time will be the merit of the testers. At this stage, everything looks simple: if you want quality, hire a quality specialist.
As the project grows, the time for manual testing will be less and less, so testers will be more and more busy working with new features of the system and less and less will check those parts of the system that should not have changed. However, as the complexity of the system grows and it is likely that explicit and implicit dependencies will appear between its components, which developers can theoretically lose sight of, it is advisable to check some things every time before release. This problem is especially acute in flexible methodologies with their short iterations and frequent releases. This logically implies the need to automate the work of testers, for example, write a script,
These measures can provide a decent level of quality, but there is no limit to perfection. What testers do is called black-box testing; it is not their responsibility to know all the features of the implementation, so testing is usually focused on typical scenarios and does not set out to break the system or test its behavior in atypical conditions. In addition, some things are not easy to verify simply because of the lack of an interface, for example, if the goal of the iteration is to develop a library for accessing data or some specific API, to test it you will need to write some kind of application or at least something that would use this component. In such cases, you have to partially return the quality control function to the developers and ask them to write integration tests. This is the second type of automated tests that are used on the project. Their goal is to test the correctness of the interaction of system components written by different people, test the behavior of these components in borderline conditions, as well as the correctness of the reaction to failures in the environment.
Well, we have testers who test the entire project for compliance with the requirements, there are tests to automate their work, and there are tests that test parts of the project written by different developers, what else can be done? Unit tests claim to be the fourth level of quality control. They check the code written by one programmer, and, as a rule, they test the minimum part of the code, which is basically suitable for testing, for example, a separate class. In practice, most often the developer himself writes unit tests for his own code, and their number and need are poorly controlled. According to my observations, about 40% of the time for developing the feature itself can be called a typical amount of developer time spent on unit tests, although this ratio can vary greatly. The open source case study of the SQLite project is widely known, where due to the excess of low-skilled free labor provided by a large number of people who want to work on a well-known project, this work force is utilized in the army way, that is, by writing useless unit tests, whose volume at some point in 100 times the code volume of the DBMS itself. The reverse cases when unit tests are not written or are written to a minimum volume are not surprising either. In the end, almost all software developed to the end of the zero, that is, before the era of outsourcing and Agile, was created without unit tests. that is, by writing useless unit tests, whose volume at some point was 100 times that of the DBMS code itself. The reverse cases when unit tests are not written or are written to a minimum volume are not surprising either. In the end, almost all software developed to the end of the zero, that is, before the era of outsourcing and Agile, was created without unit tests. that is, by writing useless unit tests, whose volume at some point was 100 times that of the DBMS code itself. The reverse cases when unit tests are not written or are written to a minimum volume are not surprising either. In the end, almost all software developed to the end of the zero, that is, before the era of outsourcing and Agile, was created without unit tests.
Costs, complexity adjustment, and mythical person-month
Of course, if you need to write unit tests or something else, you will either have to devote more time to the project, or hire additional developers. The main question that arises in this case is whether the dependence of development time and cost on the amount of code is linear, or whether it obeys another law.
Once upon a time I had a free SVN repository on the notorious Assembla service, which provided source hosting services and collaboration tools, that is, a tracker, statistics and other nonsense. Later the freebie ended, but they did not stop sending newsletters and notifications. So, in 2015, their employee published a short post entitled “How many people should discuss a task?” Now it is saved only in the Web Archive. The essence of the post was as follows: the employee collected statistics on customers, plotting the dependence of the duration of the task on the number of people who discussed it, the result was as follows:

It can be seen that the dependence is non-linear. Two people are usually involved in solving a problem lasting two days, three people - four days, and four people - already six days. What are they doing there? It can be assumed that the task requires several stages of work, for example, in the case of two people, Vasya does his part of the task, and then transfers it to Petya, so it lasts two days. Three people can already quarrel and once again share responsibilities, find out who is to blame and what to do, and a group of seven will spend six additional days discussing, agreeing and otbutting each other.
Of course, we can also assume that a friendly team of seven people has complex tasks that are much simpler and the more people are busy with the task, the more grandiose it is, because friendship is magic! Therefore, such considerations may seem far-fetched, and I will not include them in subsequent calculations, but if you would like to get a more conservative estimate, it would not be out of place to make some correction for the non-linearity of cost growth with the growth of the project code base, which, of course, Unit tests are also included, or to lay a certain margin of safety in the requirements for the level of NPV.
If we explain the non-linearity of this schedule solely by the growth of the team size, then the costs associated with it can be estimated from the following table of the dependence of the share of time lost on communication on the size of the working group:

For example, if there are five developers in a team, and you believe that you need to hire two so that everyone can spend an additional 40% of their time on unit tests, be prepared for the fact that development costs can increase by more than 40%. The team will grow and become less efficient, instead of 5 * 0.625 = 3.125 conventional units of productivity, it will have 7 * 0.539 = 3.77 units, and the amount of work will increase from 1 to 1.4 conventional units of work, respectively, the time required for development will increase by 16%.
An interesting conclusion that can be drawn from the graph is that when the number of people is more than ten, the value of each new participant becomes less than the additional cost of communication and Brooks law begins to work. It remains only to try to divide the tasks into smaller ones, or to involve more experienced and efficient employees in their implementation.
Of course, it is difficult to say that the non-linearity of the graph from Assembla is only associated with a decrease in efficiency as a result of the growth of the team, but it agrees well with the intuitive understanding of complexity and Brooks law, so if you do not want to take risks and you need a conservative estimate, this data Become a good help.
The benefits of unit tests
In addition to costs, unit tests also bring benefits. Of course, in the vast majority of cases, a bug that could be caught by unit tests will be caught at other levels of quality control, but there is always a chance of a technical failure and theoretically unit tests can reduce it. Personally, I don’t know such cases, fortunately, all the testers with whom I worked were exclusively responsible people, but when it comes to such low probabilities, personal experience can be unrepresentative. Failures can have different consequences, for example, a company may have an SLA, the violation of which will entail certain financial losses, for example, the company will be forced to give customers one month of free use of its services as compensation, losing 1/12 of the revenue. In this case, tightening quality control, which reduces the likelihood of SLA violations during the year from 10% to 8%, will reduce average annual losses by about 0.17% of revenue. This money will be the positive component of the cash flow that needs to be added to the model.
Please note that such a simple calculation is applicable only when the probability of losses is small, if the probability is higher than 15-20% and can lead to bankruptcy or liquidation of the company, it is advisable to use optional valuation models, for example, such as a decision tree. Fortunately, in most cases, some kind of stupid bug is not something that can bankrupt a company and we don’t need to plunge into the horror of calculating the cost of options.
Example One: Bison
Bison is a large online store, they call themselves the No. 1 online retailer in Russia. The company is not public, however, in a recent recapitalization transaction, its total capitalization was estimated at 50 billion rubles, which is twice the annual revenue. Additional capitalization was needed due to operating losses, but shareholders hope to achieve a 10% operating profit margin after the company succeeds in gaining a higher market share and doubling revenue within a year, after which it will have to start earning, and revenue growth will slow down up to 30% in the second year, 20% in the third year, and finally set at 10% in the fourth and subsequent years. However, banks are not very sure about this and give Bizon a long cautious attitude, the total debt of the company is only 10 billion rubles at a rate of 11%. Bison is a rather clumsy and poorly managed company at the operational level, uncontrolled hiring of employees has already led to the fact that it employs 600 programmers, whose total payroll budget is 1.5 billion rubles per year and who spend about 30% of their work time on unit tests. The company has no obligations to customers and a technical malfunction can only lead to a temporary stop of sales, and in case of a failure, a rollback to the old version of the site takes about an hour.
What is the NPV from using unit tests in bison?
Bison’s revenue should be 50, 65, 78, and 86 billion in the first, second, third, and fourth years, respectively. The failure probability is taken to be 33%, that is, an incident that can overwhelm their site for a long time can happen about once every three years, which is not so bad. Suppose the use of unit tests can reduce it to 25% simply because, in addition to developer errors, there is also the possibility of various hardware failures, DDOS attacks, and other troubles. If the website of the online store is unavailable for an hour, the retailer loses no more than 0.023% of revenue, even taking into account the fact that customers are active on average only 12 hours a day. In other words, unit tests reduce the company's losses by 11.5 million rubles in the first year, 14.8 in the second, 17.8 in the third and 19.6 million in the fourth year.
Even without taking into account staff growth and developer salaries, unit test costs will amount to 450 million rubles a year.
I think at this stage you already understand that unit tests inflict enormous damage on Bison’s financial condition even without adjusting for the increase in complexity and problems associated with loss of controllability. And this is in conditions when shareholders are forced to contribute money to finance the work of a loss-making company! No further calculations will be able to rehabilitate unit testing in this case, but we will continue to figure out how to discount cash flow.
Let’s return to the developers, suppose that the payroll increases by 10% per year, then the total effect of using unit tests is -438, -480, -527 and -579 million rubles of operating loss in the first, second, third and fourth years, respectively, after which the loss grows by 10% annually. Unit tests in this case do not affect net capital costs and working capital, but the loss leads to tax savings of 20% of the loss, respectively, it must be multiplied by 0.8: -351, -384, -421 and -463 million rubles.
The company's EV is 50 + 10 = 60 billion rubles, P accounts for 83% of the capital, D 17%, we know that the cost of debt is 11% per annum, then to calculate WACC it remains only to find the cost of equity. The bison works in Russia, therefore, as the risk-free rate, you need to take the effective yield of government bonds with the greatest duration, now it is 7.6%. The premium for investing in equity varies from year to year, but usually it is in the region of 4-6% per annum, we will take 5%, and to determine the coefficient β, we will look at the directory and find there the leverless risk coefficient for companies from the online retail industry (unlevered beta) equal to 1.3. But Bison has, though small, debts, so you need to make an amendment and calculate the leverage beta:
Thus, the WACC discount rate will be
Finally, we calculate how much unit tests stand for Bison, for this we discount the first years of uneven growth separately, and for the next years with an increase of 10% per year, we use the Gordon model.
The reduced cost of the first year will be million rubles, second million third millions.
Starting from the fourth year, losses uniformly grow by 10% per year, respectively, after the third year, the nominal loss from unit tests can be calculated by the formula millions to be brought to the first year: one million rubles.
In general, the damage from using unit tests is billion rubles.
Example Two: Hyperstal
Vasily is a promising graduate of the Chelyabinsk College of Innovative Technologies. There are no Amazons and Google in Chelyabinsk, but there are many steel companies, one of which he was lucky to get. As it turned out later, budgets are modest, money is chronically short, so she could afford to hire a programmer with only a salary of less than 50 thousand rubles, including all taxes and mandatory payments. The first task of Vasily was software for controlling the operation of a blast furnace. This project should take no more than two months and is unlikely to be further supported and developed.
During the visit of Vasily to the workshop, the specialist in charge of production told him in general terms the following: “Dear colleague! Please pay attention to this giant molten metal bucket. If something goes wrong, we will not only be extremely discouraged, but also face technological difficulties. The fact is that if the blast furnace rises, the metal inside it hardens and it will take three months to eliminate the consequences of the accident. "It will not be easy to deal with a giant piece of metal in the workshop and replace it with a new blast furnace." Vasily later found out that a blast furnace emergency stop could cost the company 8 billion rubles.
Question: Is Vasily worried about writing unit tests?
Since I no longer have the strength and patience to calculate the obvious, I will immediately say the answer: of course yes. Vasily has no experience, he has a high probability of making a mistake (I give 50 percent, that alone without the help of colleagues and without adequate quality control his program will fail somewhere and 10 percent that this will lead to an accident), his time is worth nothing, and the price of error is extremely high. Since in this example we are talking about a short project that will be written and forgotten, there is no need to discount anything, it is enough to compare Vasily's salary for two months, equal to 100 thousand rubles and the expected loss of about 10% * 8 billion = 800 million rubles.
Example Three: XSoft
XSoft is a successful outsourcing company that has just signed a contract with another Western customer. The customer plans to hire 7 programmers, his budget in this part is 15 million rubles a year, of which XSoft will take 3 million. The customer is a burdock and does not understand anything in the development. From XSoft's point of view, should developers write unit tests?
Of course! In this case, the cost of writing and supporting unit tests is borne by the customer, and for the contractor, the additional amount of work only means an increase in the duration of the project and additional profit, which is at least proportional to the number of man-hours spent on unit tests, and at best grows faster than -for increasing the code base and complexity of the project. With your permission, I will not develop this idea further, delve into the intimate details of the relationship of the outsourcer with the customer and discount his cash flows, since the conclusion is already obvious.
Afterword
The article turned out to be large, and I hope that I tried not in vain. Like any project in any business, the decision to use unit tests on your project should be economically justified. When companies in other industries plan to buy a machine, open a factory or store, they will certainly calculate NPV and / or IRR. It is sad to see how IT remains a careless industry in this regard. But knowing the basics of finance and the habit of opening Excel on time can give you a noticeable advantage over your competitors.