verifyMe July 30, 2014 at 10:33

About testing metrics: code coverage for testers

From the sandbox

As you know from the book “A Hitchhiker's Guide to the Galaxy,” the answer to the main question of life, the universe, and all that is 42. The percentage of code coverage on the lines on one of my projects is 81, does this figure give the answer to the main testing question “how much enough tests to determine product quality? ”

During my work in the IT field and testing, I saw few teams and projects where testers really use code coverage in their work. In my opinion, this is due to two things:

1. The fact that we are testing the requirements first of all;
2. Not everyone understands how to count and use the coating.

For those interested, I offer my opinion on these 2 points.

Requirements vs Code

The tester tests the requirements. Even if they are not formally present, there is an idea of how the system should behave. This and only this is important in the long run.
But .
There are no clear, exhaustive complete requirements, having checked each of which, we can safely say that the system will work as it should and there are no bugs.

Example 1

The application is trying to save data in the database (located on another server). There is a description of how it should do this, including the requirement that if it is impossible to perform an operation (there is no access to the database, for example), we should try to do this before a certain timeout expires, then give the client an error.

What does it mean you cannot perform the operation?

Suppose a tester checks a script with a loss of connection to the database during operation. Everything works well, but does it mean that there are no bugs?
In the mentioned application, we looked at the code coverage of the corresponding classes - it turned out that the developer provided for the processing of about 5 exceptional situations in the code.

This meant, at a minimum, the following cases:
1. A connection to the database server cannot be established;
2. A connection to the database server is established, the query execution caused an error;
3. A connection to the database server was established, the request began to run and hung - there was a bug. The application waited for an answer for about 5 minutes, then the execution flew to the logs and it did not try to record this data anymore.

The other couple were not worth attention for various reasons.

In the example of the requirement, the 1st case was also formally checked, but the bug was found after analyzing the code coverage. It can be argued that this example is not about the benefits of code coverage, but about the benefits of interacting within the team (you could find out the details of the implementation in advance or give him cases for review), ~~in fact, I always do that~~ but you’ll not guess what to ask, often uncovered blocks of code attract attention to certain things.

Example 2

In another system, which I tested, in case of loss of data consistency, the application should throw the appropriate execution, throw notification to the monitoring and wait for people to come and save it. Tests covered different cases of such situations; everything was processed normally.
We looked at the code, the desired piece was well covered, but in another class I saw an uncoated area of the code in which the same event rushed about the loss of consistency. Under what conditions - it is unknown, because the developers quickly sawed it off. It turned out he was skipped from the old project, but no one remembered this. Where it could shoot is unknown, but without code analysis we would not have found it.

Therefore, let the tester test the requirements, but if he also looks at the code, he can catch what is not described in the requirements and the cunning methods of test design will not always be found either.

Coverage = 80. And the quality?

Quantity does not mean quality. Evaluation of code coverage is not directly related to product quality, but is indirectly related.
At one reporting meeting, I stated that our code coverage increased to 82% for lines and 51% for terms, after which the management asked me the question: “What does this mean? Is it good or bad? ”The logical question, really: how much does it take to be good?

Some developers cover their code, achieving 100%. It’s pointless to achieve a 100% tester, starting from some point you will encounter the fact that you physically cannot affect this code with integration tests.
For example, the developers consider it good practice to check the input parameters of the method for null, although in a really working system there may not be such cases (50% under the conditions we had then, including because of this). This is normal, it was possible to pass null from the outside only until the first check, which actually processes this situation.

To the question of "this is normal": a qualitative assessment of uncovered code and leads, in my understanding, to the adequate use of code coverege. It is important to watch that you are not covered, not how much. If it is java code and methods toString (), equals () or branches with exception, which are difficult to reproduce integrally, well, okay, let there be 80% coverage of real business logic. Many tools can filter and not count extra code.
If there are still doubts about the white spots, it is possible to calculate the total coverage of integration tests and unit - the developers probably took into account a lot of things that are difficult to access for integration tests.

However, there is one “but.” What if code coverage is low? 20%, 30%? Somewhere I read a funny fact that coverage of 50% or less (according to lines and conditions, as I recall) means the level of test coverage at which the result of the application will be the same as in the absence of testing at all. Those. there may be bugs, there may not be bugs, with the same success you might not have tested it. Another explanation is a lot of dead code, which is unlikely.

And we do not have autotests

But they are not needed. Even if you are assured of the opposite, some developers are not aware that coverage can be considered not only for unit tests. There are tools that write coverage in runtime, i.e. put a ~~specially trained~~ instrumented build, take tests on it, and he writes the cover.

What's the point?

My friend, a wonderful test lead, asked the question: “when test cases aren’t everything and automation is in its infancy, does it make sense to spend resources on evaluating code coverage?” Implementing new pieces in the process always causes a certain pain for management: time, resources and other frailties of existence, no room for the flight of a tester-dreamer.

We will analyze in order exactly where resources will need to be spent, if you decide to try to consider code coverage:

Choosing a tool suitable for your application
Instrumentation of builds (including configuration of code coverage tooltip and filtering of "unnecessary" for code evaluation)
Coverage report after test run
Coverage Analysis

Items 1 and 2 can be given to developers, many of them are familiar, heard, met with well-known tools and, moreover, will be able to build their own build. Reporting, as a rule, is done by a single command on the command line or automatically if you use CI (jenkins did this for me, he also published a report).
The most expensive is the fourth point. The main difficulty here is that for an adequate assessment you must be able to read the code, or sit down next to the developer so that he explains what this piece means and how to reproduce it. This requires a certain qualification from the test engineer and working time of 1 or 2 people.

Whether it's worth it is up to the team and its leaders to decide. In projects where the requirements are poorly formalized, or bugs arise in an inexplicable way for testers, perhaps this can help at least understand the direction where to dig.
Another category is projects that involve very hight-level black box testing. This is primarily testing through the UI or external API systems, which contain a bunch of logic that works according to its own laws, i.e. from the outside you cannot touch it or control it, which means you cannot test it normally. Analysis of coverage in such projects will create a reasoned need to move to lower test levels: modular, component-wise, stub testing, etc.
The accumulated code coverage in numbers works well: on the graphs you can see the moments when a new code is pouring in and the tests have not arrived yet; if the coverage level was high, then it began to decline, but did not reach the previous level - somewhere there may be a good white spot of requirements that did not reach testing, etc.

Perhaps this is all that I wanted to say for today.

Finally, limitations and out of scope

I tried to describe in general terms the approach to this issue, without going into many technical details. Speaking of “coverage” of 80%, I am talking about a certain general or average coverage, not referring to specific metrics - coverage of lines, conditions, etc. The choice of specific metrics is a separate interesting question.
My experience is mainly related to java code and tools for it, I did not work in this vein with other technologies, I know that there are tools for C ++, but so far I have not been able to try them.
A serious coverage analysis should be carried out on stable builds and stable tests, otherwise it will be difficult to say what caused the omissions - dropped tests, critical bugs or really something was missed

Tags: