
Useful metrics for evaluating projects
In October, I already talked about ways to evaluate testing, all those who are suffering and sympathizing can see the recording here . And today I wanted to touch on the topic of the project’s metrics as a whole, and the metrics are not “for gadgets,” but metrics of “user-friendly” and “improving projects”. That is why, instead of dry formulas and a list of metrics, I will tell 3 stories from experience about the implementation and use of strictly defined metrics in strictly defined conditions - and about the results that were achieved with their help.
There is a project. Your beloved, dear, to whom you wish to grow and prosper.
But how do you rate its prosperity if there are no criteria for this prosperity?
How can you quickly respond to problems before they become incorrigible if you do not use the “sensor of the coming F”?
How do you understand what should be improved if you don’t know the source of the problem?
In short, the metrics are needed to effectively manage the project: diagnose problems, localize them, fix and check whether the methods you have chosen to solve the problem really help.
I will share different types of metrics, each of which is tested and has brought considerable benefit. Each time, introducing them, any team is very lazy and uncomfortable: you have to save additional information, measure something there, breed a bureaucracy. But when we first benefit from a metric, laziness is replaced by discipline and a deep understanding of the importance of a particular metric.
And if they don’t come, then the metric can be safely thrown away;)
At one great company, management complained about a “low-quality product,” the fault of which was testing. My task was to analyze the causes of this annoying misunderstanding and somehow solve them, and naturally yesterday.
Task # 1 became obvious to me: estimating% of missed errors : is it true that testers miss something? To do this, we entered the field “reported by the client” in the bug tracker, marked the old bugs in this way and counted. The percentage was just over 5%, and far from all of them were critical.
Is it a lot or a little? In my experience, this is a pretty good percentage. Why then the opinion that testers miss a lot?
We introduced one more field: “reproduced on the release version”. Each time, when registering a new error from the test bench, testers checked to see if it was in the latest user version: perhaps users simply did not report specific errors? The result for the first month - about 40 % of errors registered in the bug tracker are reproduced in the release version .
It turns out that we really miss a lot, but users do not report specific errors, but the opinion “your software sucks!” Is clearly being formed. Thus, we have formed metrics-sensors: what is wrong:
We set a goal (otherwise why should we measure anything at all?)! We want no more than 10% of the errors that hit the release version. But how to ensure this? Excessively expand resources? Increase timelines?
To answer this question we need to dig further, and look for new metrics that will give an answer to this question.
In this case, for all the missed errors, we added one more field: “Reason for skipping”. And we indicate why we didn’t get this bug before:
Using this algorithm, I have already studied the causes of omissions in many companies, and the results are always different. In my case, more than 60% of the errors were missed because the testers did not take into account any test, that is, they did not even think that it should be tested. Of course, we need to work on all fronts, but we started with 60%, relying on the Pareto law.
Brainstorming “how to solve this riddle” led to various solutions: weekly discussion of missing defects in the testing group, coordination of tests with analysts and developers, direct communication with users to study their environments and conditions, etc. By introducing these new procedures slowly, in just 2 months we reduced the% of missed errors to 20%. NOT expanding the team, NOT increasing the timeline.
We have not reached 10% yet, but in July it was 14% - we are already very close to the goal, and judging by the assurances of the implementers, customers have already noticed changes in quality. Not bad, huh?
This story concerns one of my own projects. We are developing some terribly necessary and useful service, and the development timeline did not really warm my soul. Naturally, everything on my project is very good with testing, but why is development barely weaving?
Naturally, I started by trying to measure my subjective feelings “slowly”. How to understand this? What to compare? KLOC per month? Feature to iteration? Average breakdowns regarding the plan? Naturally, the first 2 metrics will not bring anything useful, so I began to look at the% breakdowns in terms of features (iterations do not have a fixed set of features, therefore they can’t be seriously late - which we managed to do and test in 2 weeks, then we post it). But features!
It turned out that according to them we disrupt the terms by an average of 1.5-2 times! I will not tell you what it cost me to get this information from the redmine, but here it is. And I want to dig further, using the principle of "five" why "." Why is that? Are we bad at planning? I want the result too fast? Or low qualifications? What does time take?
I began to analyze: on average, 1 small feature accounts for 15 to 40 bugs, and it takes more time to fix them than to develop the feature itself. Why? Is it a lot or a little? The developers complain that there are a lot of requests to change the already developed functionality - is this true or a subjective erroneous assessment?
Digging further. I enter into the poor unfortunate bug-tracker field swollen from additional fields: "The reason for the error." Not a pass, as in History # 1, namely APPEARANCE. This field is filled in by the developer at the time of the commit, when he already knows for sure what and how he fixed it. And the answer options are as follows:
We had about 30% errors in the code. Changes in requirements - less than 5% (developers were surprised, but admitted - this is because they indicate the reason!). And almost 70% of the errors were caused by a lack of understanding of the requirements. In our case, when a bugfix takes more development, it is HALF TIME SPENDED BY DEVELOPING A FEATURE.
What to do?
We found a lot of solutions to the problem, starting from hiring a technical writer who will find out the requirements of the product owner and document in detail everything that we describe in a couple of lines and ending with the product owner, transferred to the secretary, documenting new features around the clock . We didn’t like any of these options; they are too bureaucratic for a team of 4 developers sitting in the same office. Therefore, we did the following:
Now we have about 3-7 such ~ hourly “talkers” a week, for which 2-3 people come off. The number of started bugs has decreased, of which more than 50% of code errors have become - therefore, our next task will be to introduce code review, as now we have a new “main problem”.
But from the analyzer metric, we returned to the sensor metric and realized that never since spring had we missed the deadlines for the feature by more than 50%, although before that the average value of the breakdown was from 50% to 100%, and sometimes even more.
And this is just the beginning! ;-)
Another story concerns my very recent experience with a third-party company. True-real Agile, weekly iterations ... And weekly deadlines!
The reason stated by the company's management: "Developers allow too many bugs."
I began to analyze how this happens. I just participated in the process and watched from the side, as it was very coolly described in Imai’s book “Gemba Kaizen”. And this is what I saw: Releases on Thursdays, Friday preparatory day for a new iteration. On Tuesday-Wednesday there is an assembly for testing. On Wednesday-Thursday defects start up. On Friday, instead of preparing for a new iteration, developers urgently fix bugs and so every week.
I asked in the task tracker, where features from the board are duplicated, put down statuses on the feature: the feature has been accepted for development, the feature has been tested, the feature has been tested and sent for revision, the feature has been tested and accepted for release.
And what do you think, what is the average time between the “feature was given for testing” and “the feature has been tested and sent for revision”? 1.5 days!
And sometimes - with the ONLY blocking defect.
The developers at this company complained about the brake testers, but the testers and the management were against the developers: "you yourself have to test and not give the raw product." But Caesar’s Caesar’s!
So, there is a metric, 1.5 days is unacceptably long, we want to reduce at least three times - this should speed up releases by a day. How to do it? Again, a brainstorm, again a bunch of ideas, 90% of the participants in the process insist that "developers should test for themselves."
But in the end, we decided to try it differently: as soon as the feature, according to the developer, is ready, the tester sits with him on one computer, takes a notebook with a pen and starts checking, commenting, writing out the noticed jambs in the notebook, without wasting time on the bug- tracking. More than half of the bugs developers in this mode fix on the fly! After all, the feature is only written, it still holds in my head!
We reduced the term from 1.5 to 0.5 days very quickly, but in practice we achieved another, more serious change: the% of features transferred to the status “sent for revision” decreased from almost 80 to almost 20! That is 4 times! That is, 80% of the features were now immediately accepted after being transferred to the “testing” status, because shortly before the transfer to this status, testing was done on the fly, which greatly reduces both the time for registering errors and the cost of fixing them.
By the way, story 3 is the only one where we immediately achieved our goal. Disruptions to iterations are still there, but now this is the exception, and almost every Thursday the development team leaves home on time, and on Friday the truth begins preparations for the next iteration.
Bingo!
I really did not want to draw dry formulas, philosophize and theorize. I told specific stories from fresh (2012!) Experience. Stories in which we shortened deadlines and improved quality without changing the budget.
Are you still not ready to use the metrics to good effect?
Then we go to you! :)
Why measure anything?
There is a project. Your beloved, dear, to whom you wish to grow and prosper.
But how do you rate its prosperity if there are no criteria for this prosperity?
How can you quickly respond to problems before they become incorrigible if you do not use the “sensor of the coming F”?
How do you understand what should be improved if you don’t know the source of the problem?
In short, the metrics are needed to effectively manage the project: diagnose problems, localize them, fix and check whether the methods you have chosen to solve the problem really help.
I will share different types of metrics, each of which is tested and has brought considerable benefit. Each time, introducing them, any team is very lazy and uncomfortable: you have to save additional information, measure something there, breed a bureaucracy. But when we first benefit from a metric, laziness is replaced by discipline and a deep understanding of the importance of a particular metric.
And if they don’t come, then the metric can be safely thrown away;)
Story 1: Who let him in here ??
At one great company, management complained about a “low-quality product,” the fault of which was testing. My task was to analyze the causes of this annoying misunderstanding and somehow solve them, and naturally yesterday.
Task # 1 became obvious to me: estimating% of missed errors : is it true that testers miss something? To do this, we entered the field “reported by the client” in the bug tracker, marked the old bugs in this way and counted. The percentage was just over 5%, and far from all of them were critical.
Is it a lot or a little? In my experience, this is a pretty good percentage. Why then the opinion that testers miss a lot?
We introduced one more field: “reproduced on the release version”. Each time, when registering a new error from the test bench, testers checked to see if it was in the latest user version: perhaps users simply did not report specific errors? The result for the first month - about 40 % of errors registered in the bug tracker are reproduced in the release version .
It turns out that we really miss a lot, but users do not report specific errors, but the opinion “your software sucks!” Is clearly being formed. Thus, we have formed metrics-sensors: what is wrong:
- % missed in release version errors
- % user reported errors
We set a goal (otherwise why should we measure anything at all?)! We want no more than 10% of the errors that hit the release version. But how to ensure this? Excessively expand resources? Increase timelines?
To answer this question we need to dig further, and look for new metrics that will give an answer to this question.
In this case, for all the missed errors, we added one more field: “Reason for skipping”. And we indicate why we didn’t get this bug before:
- unknown requirement (did not know or did not understand that it was necessary)
- did not take into account the test (did not think of testing it SO)
- not tested (the test was, it was checked, but then the functionality broke, and this area was not checked again)
Using this algorithm, I have already studied the causes of omissions in many companies, and the results are always different. In my case, more than 60% of the errors were missed because the testers did not take into account any test, that is, they did not even think that it should be tested. Of course, we need to work on all fronts, but we started with 60%, relying on the Pareto law.
Brainstorming “how to solve this riddle” led to various solutions: weekly discussion of missing defects in the testing group, coordination of tests with analysts and developers, direct communication with users to study their environments and conditions, etc. By introducing these new procedures slowly, in just 2 months we reduced the% of missed errors to 20%. NOT expanding the team, NOT increasing the timeline.
We have not reached 10% yet, but in July it was 14% - we are already very close to the goal, and judging by the assurances of the implementers, customers have already noticed changes in quality. Not bad, huh?
Story 2: Where did the firewood come from?
This story concerns one of my own projects. We are developing some terribly necessary and useful service, and the development timeline did not really warm my soul. Naturally, everything on my project is very good with testing, but why is development barely weaving?
Naturally, I started by trying to measure my subjective feelings “slowly”. How to understand this? What to compare? KLOC per month? Feature to iteration? Average breakdowns regarding the plan? Naturally, the first 2 metrics will not bring anything useful, so I began to look at the% breakdowns in terms of features (iterations do not have a fixed set of features, therefore they can’t be seriously late - which we managed to do and test in 2 weeks, then we post it). But features!
It turned out that according to them we disrupt the terms by an average of 1.5-2 times! I will not tell you what it cost me to get this information from the redmine, but here it is. And I want to dig further, using the principle of "five" why "." Why is that? Are we bad at planning? I want the result too fast? Or low qualifications? What does time take?
I began to analyze: on average, 1 small feature accounts for 15 to 40 bugs, and it takes more time to fix them than to develop the feature itself. Why? Is it a lot or a little? The developers complain that there are a lot of requests to change the already developed functionality - is this true or a subjective erroneous assessment?
Digging further. I enter into the poor unfortunate bug-tracker field swollen from additional fields: "The reason for the error." Not a pass, as in History # 1, namely APPEARANCE. This field is filled in by the developer at the time of the commit, when he already knows for sure what and how he fixed it. And the answer options are as follows:
- Code (here they got it and messed it up)
- Misunderstanding of the requirements ("oh I didn’t understand what exactly was needed!")
- Change in requirements (product owner looked at the result and said “uh, really, it needs to be different, and not the way I originally requested”)
We had about 30% errors in the code. Changes in requirements - less than 5% (developers were surprised, but admitted - this is because they indicate the reason!). And almost 70% of the errors were caused by a lack of understanding of the requirements. In our case, when a bugfix takes more development, it is HALF TIME SPENDED BY DEVELOPING A FEATURE.
What to do?
We found a lot of solutions to the problem, starting from hiring a technical writer who will find out the requirements of the product owner and document in detail everything that we describe in a couple of lines and ending with the product owner, transferred to the secretary, documenting new features around the clock . We didn’t like any of these options; they are too bureaucratic for a team of 4 developers sitting in the same office. Therefore, we did the following:
- Product owner briefly, as always, describes a new feature
- The developer, when it comes to it, carefully thinks about the implementation method, how it will look, what he should do with this feature
- After that, the developer and RO sit down together, and the developer tells in detail his thoughts on the
bright future of thefeature being developed - Under no circumstances does the developer begin work on a new feature without going through the above algorithm of actions and without coordinating his vision with RO
- The tester most often participates in this process, prompting in advance the difficult moments that he will test
Now we have about 3-7 such ~ hourly “talkers” a week, for which 2-3 people come off. The number of started bugs has decreased, of which more than 50% of code errors have become - therefore, our next task will be to introduce code review, as now we have a new “main problem”.
But from the analyzer metric, we returned to the sensor metric and realized that never since spring had we missed the deadlines for the feature by more than 50%, although before that the average value of the breakdown was from 50% to 100%, and sometimes even more.
And this is just the beginning! ;-)
History # 3: Who is slowing down developers?
Another story concerns my very recent experience with a third-party company. True-real Agile, weekly iterations ... And weekly deadlines!
The reason stated by the company's management: "Developers allow too many bugs."
I began to analyze how this happens. I just participated in the process and watched from the side, as it was very coolly described in Imai’s book “Gemba Kaizen”. And this is what I saw: Releases on Thursdays, Friday preparatory day for a new iteration. On Tuesday-Wednesday there is an assembly for testing. On Wednesday-Thursday defects start up. On Friday, instead of preparing for a new iteration, developers urgently fix bugs and so every week.
I asked in the task tracker, where features from the board are duplicated, put down statuses on the feature: the feature has been accepted for development, the feature has been tested, the feature has been tested and sent for revision, the feature has been tested and accepted for release.
And what do you think, what is the average time between the “feature was given for testing” and “the feature has been tested and sent for revision”? 1.5 days!
And sometimes - with the ONLY blocking defect.
The developers at this company complained about the brake testers, but the testers and the management were against the developers: "you yourself have to test and not give the raw product." But Caesar’s Caesar’s!
So, there is a metric, 1.5 days is unacceptably long, we want to reduce at least three times - this should speed up releases by a day. How to do it? Again, a brainstorm, again a bunch of ideas, 90% of the participants in the process insist that "developers should test for themselves."
But in the end, we decided to try it differently: as soon as the feature, according to the developer, is ready, the tester sits with him on one computer, takes a notebook with a pen and starts checking, commenting, writing out the noticed jambs in the notebook, without wasting time on the bug- tracking. More than half of the bugs developers in this mode fix on the fly! After all, the feature is only written, it still holds in my head!
We reduced the term from 1.5 to 0.5 days very quickly, but in practice we achieved another, more serious change: the% of features transferred to the status “sent for revision” decreased from almost 80 to almost 20! That is 4 times! That is, 80% of the features were now immediately accepted after being transferred to the “testing” status, because shortly before the transfer to this status, testing was done on the fly, which greatly reduces both the time for registering errors and the cost of fixing them.
By the way, story 3 is the only one where we immediately achieved our goal. Disruptions to iterations are still there, but now this is the exception, and almost every Thursday the development team leaves home on time, and on Friday the truth begins preparations for the next iteration.
Bingo!
conclusions
I really did not want to draw dry formulas, philosophize and theorize. I told specific stories from fresh (2012!) Experience. Stories in which we shortened deadlines and improved quality without changing the budget.
Are you still not ready to use the metrics to good effect?
Then we go to you! :)