marshinov March 4, 2019 at 01:16

About Evaluation and Management of Software Product Development

The Institute teaches algorithms, data structures, OOP. In a good case, they can talk about design patterns or multi-threaded programming. But I didn’t hear about how to correctly evaluate labor costs.

Meanwhile, every programmer needs this skill every day. There is always more work than can be done. Evaluation helps to prioritize correctly, and to abandon some tasks altogether. Not to mention such household issues as budgeting and planning. Incorrect estimates, on the contrary, create a bunch of problems: underestimated - conflict situations, processing and holes in budgets, overstated ones - canceling projects or searching for other performers.

In fairness, it should be noted that undervaluation is much more common in development. Why? Someone thinks that programmers are too optimistic in nature. I will add one more reason to this: being a good programmer and being a good appraiser is not the same thing. To become a good programmer, one desire is not enough. Need knowledge and practice. Why would the assessment be different?

In the article I will talk about how my attitude to assessing tasks has changed and how projects in our company are being evaluated. And I'll start with how you do not need to evaluate. If you already know about how “do not”, go directly to the second part of the article .

Anti Evaluation Patterns

Most evaluations in projects are made at the beginning of their life cycle. And this does not bother us until we understand that the estimates were obtained earlier than the requirements were determined, and, accordingly, earlier than the task itself was studied. Consequently, an assessment is usually not done on time. This process is correctly called not assessment, but guessing or prediction, because every spot in the requirements is a guessing game. How much does this uncertainty affect the final results of the assessment and its quality?

A trifle, but unpleasant

Suppose you are developing an order entry system, but you have not yet been able to develop requirements for entering phone numbers. Among the uncertainties that can affect the evaluation of the program, we can immediately highlight the following:

Does the client need the entered phone number to be checked for validity?
If the client needs a phone number verification system, which version will he prefer - cheap or expensive?
If you implement a cheap version of checking phone numbers, will the client want to switch to the expensive one later?
Is it possible to use a ready-made system for checking phone numbers or, due to some design restrictions, you need to develop your own?
How will the verification system be designed?
How long does it take to program a phone number verification system?

And these are just a few questions from the list that arises in the head of an experienced project manager ... As can be seen even from this example, potential differences in the definition, design and implementation of the same capabilities can accumulate and increase the implementation time by hundreds or more times. And if we combine them in hundreds and thousands of functions of a large project, we get enormous uncertainty in the assessment of the project itself.

Another excellent example of a “swelling” would seem to be an elementary requirement, read in the article “ How are two weeks ?! ”

Cone of Uncertainty

Software development - and many other projects - consists of thousands of solutions. Researchers found that project estimates at different stages inherent in projected levels of uncertainty. The cone of uncertainty shows that estimates become more accurate as the project progresses. Please note that at the stage of the initial concept (where estimates are often made and obligations are made), the error can be 400% (four hundred percent, Karl!). It is optimal to make commitments after the completion of detailed design.

Mythical man-month

There are still executives who believe that if the functionality is rigidly fixed, a reduction in time can be achieved at any time by adding staff that would do more work in less time. The error of such reasoning lies in the very unit of measurement used in the assessment and planning: man-month. Cost is really measured as the product of the number of employees and the number of months spent. But the result is not achieved. Therefore, the use of man-months as a unit of measurement of the volume of work is a dangerous misconception. All researchers agreed that reducing the nominal period increases the total amount of work. If the nominal term for a group of 7 people is 12 months, then a simple increase in staff to 12 people will not reduce the period to 7 months.

In large groups, the costs of coordination and management increase, and the number of communication paths grows. If all parts of the task must be separately coordinated among themselves, then the cost of communication grows quadratically, and the “power” of the team grows linearly. Three workers require three times as much pairwise communication as two, four for six.

The project team is trying to cope with the rumps // Ivan Aivazovsky, 1850

If 8 people can write a program in 10 months, can 80 people write the same program in one month? The inefficiency of extreme tightening deadlines becomes especially evident in extreme cases - such as 1,600 people who need to write a program in one day. Read more about this in the book of the same name by Frederick Brooks .

Evaluation Patterns

So, with the problems everything is clear. What can be done?

Decomposition

Instead of evaluating a large task, it is better to divide it into many small ones, evaluate them, and get the final grade as the sum of the initial grades. Thus, we immediately kill as many as four birds with one stone:

We better understand the scope of work. To decompose a task, you need to read the requirements. Inexplicable places will immediately emerge. The risk of misinterpreting requirements is reduced.
During the analysis of a more detailed analysis of the requirements, the thought process of systematizing knowledge automatically starts. This reduces the risk of forgetting some part of the work, such as refactoring, test automation, or the extra effort of laying out and deploying
The decomposition result can be used for project management, provided that for both processes one tool was used (this issue is discussed in more detail later in the text).
If we measure the average error of the estimation of each problem obtained during the decomposition and compare this error with the error of the total estimate, it turns out that the total error is less than the average. In other words, such an assessment is more accurate (closer to real labor costs). At first glance, this statement is counter-intuitive. How can the final assessment be more accurate if we make a mistake in evaluating each decomposed problem? Consider an example. In order to create a new form you need to: a) write the code on the backend, b) make up the layout and write the code on the frontend, c) test and lay out. Task A was evaluated at 5 hours, tasks B and C at 3 hours each. The total score was 11 hours. In reality, the backend was done in 2 hours, the form took 4, and testing and fixing bugs took another 5. The total workload was 11 hours. Ideal to be rated. Moreover, the error in evaluating task A is 3 hours, task B is 1 hour, and C is 2 hours. The average error is 3 hours. The fact is that the errors of understating and overestimating the estimates cancel each other out. The 3 hours saved on the backend compensated for the lag of 1 and 2 hours at the front-end and testing stages. Actual labor is a random variable that depends on many factors. If you get sick, then it will be difficult for you to concentrate and instead of three hours it may take six. Or some unexpected bug will come up that will have to be searched and fixed all day. Or, conversely, it may turn out that instead of writing your own component, you can use an existing one, etc. Positive and negative deviations will cancel each other out. Thus, the total error will decrease. The average error is 3 hours. The fact is that the errors of understating and overestimating the estimates cancel each other out. The 3 hours saved on the backend compensated for the lag of 1 and 2 hours at the front-end and testing stages. Actual labor is a random variable that depends on many factors. If you get sick, then it will be difficult for you to concentrate and instead of three hours it may take six. Or some unexpected bug will come up that will have to be searched and fixed all day. Or, conversely, it may turn out that instead of writing your own component, you can use an existing one, etc. Positive and negative deviations will cancel each other out. Thus, the total error will decrease. The average error is 3 hours. The fact is that the errors of understating and overestimating the estimates cancel each other out. The 3 hours saved on the backend compensated for the lag of 1 and 2 hours at the front-end and testing stages. Actual labor is a random variable that depends on many factors. If you get sick, then it will be difficult for you to concentrate and instead of three hours it may take six. Or some unexpected bug will come up that will have to be searched and fixed all day. Or, conversely, it may turn out that instead of writing your own component, you can use an existing one, etc. Positive and negative deviations will cancel each other out. Thus, the total error will decrease. the savings on the backend compensated for the lag of 1 and 2 hours at the frontend and testing stage. Actual labor is a random variable that depends on many factors. If you get sick, then it will be difficult for you to concentrate and instead of three hours it may take six. Or some unexpected bug will come up that will have to be searched and fixed all day. Or, conversely, it may turn out that instead of writing your own component, you can use an existing one, etc. Positive and negative deviations will cancel each other out. Thus, the total error will decrease. the savings on the backend compensated for the lag of 1 and 2 hours at the frontend and testing stage. Actual labor is a random variable that depends on many factors. If you get sick, then it will be difficult for you to concentrate and instead of three hours it may take six. Or some unexpected bug will come up that will have to be searched and fixed all day. Or, conversely, it may turn out that instead of writing your own component, you can use an existing one, etc. Positive and negative deviations will cancel each other out. Thus, the total error will decrease. which will have to be searched and fixed all day. Or, conversely, it may turn out that instead of writing your own component, you can use an existing one, etc. Positive and negative deviations will cancel each other out. Thus, the total error will decrease. which will have to be searched and fixed all day. Or, conversely, it may turn out that instead of writing your own component, you can use an existing one, etc. Positive and negative deviations will cancel each other out. Thus, the total error will decrease.

Features and Tasks

At the heart of the decomposition we have is Feature. A feature is a unit of delivery of functionality that can be put on production independently of others. Sometimes this level is called User Story, but we came to the conclusion that User Story is not always well suited for setting tasks, so we decided to use a more general formulation.

One member is responsible for one feature. Someone can help him with the implementation, but one person passes into testing. The task is also being returned to him for revision. Depending on the organization of the team, this may be a team leader or directly a developer.

Unfortunately, sometimes there are big features. It will take a very long time to work alone on such a volume. And for a long time you will have to test and implement the acceptance process. Then we change the type of task to Epic. Epic is just a very thick feature. We don’t start anything more than an epic. Those. epics can be just big, huge or gigantic. In any case, the epic is sent in parts (features) to the acceptance.

In order to evaluate more accurately, features are decomposed into separate subtasks (Task). For example, a feature could be the development of a new CRUD interface. The structure of tasks can look like this: “display a table with data”, “fasten filtering and search”, “develop a new component”, “add new tables to the database”. The structure of tasks is usually not at all interesting for business, but it is extremely important for the developer.

Evaluation in groups, poker planning

Programmers are too optimistic about the amount of work. According to various sources, undervaluation most often varies in the range of 20-30%. However, in groups the error is reduced. This is due to better analysis due to different points of view and evaluating temperament.
The most common practice with the growing popularity of Agile is the practice of “ poker planning ”. However, two problems are associated with group assessment:

Social pressure
Time costs

Social pressure

In almost any group, the experience and personal effectiveness of the participants will vary. If the team has a strong team / tech - the lead / lead programmer, other members may feel uncomfortable and deliberately underestimate their grades: “Well, how can Vasya do it, but am I worse? I can do that too!". The reasons may be different: the desire to seem better than it really is, competition or just conformism. The result is one: a group assessment loses all its advantages and becomes individual. Timlid gives marks, and the rest simply assent to him.

For a long time I put pressure on the team in order to get ratings that are more consistent with my expectations. This invariably led to a decrease in quality and a breakdown in terms. As a result, I changed my attitude and now my rating is often the largest. During the discussion, I point out potential problems that come to mind: “here refactoring would not hurt, here our database structure is changing, it would be necessary to do a regression test.”

There are several main recommendations:

Most ratings are underestimated. Can't choose between two ratings? Take the one that is bigger.
Not sure about the evaluation - throw out the card "?" or a great rating. Perhaps almost never carries.
Always compare plan and fact. If you know that you don’t fit twice, give an estimate two times higher than what you think. Started to overstate? Multiply in your mind by one and a half. After a few iterations, the quality of your grades should improve significantly.

Time costs

You know the phrase “Do you want to work?” Gather a meeting! ” Not only does one programmer try to predict the future instead of writing code. Now the whole group does it. In addition, working out a decision in a group is a much longer process than making individual decisions. Thus, group assessment is an extremely costly process. It is worth looking at these costs from the other side. First, in the assessment process, the group is forced to discuss the requirements. This means that you have to read them. Already not bad. Secondly, let's compare these costs with those that the company incurs due to underestimation of the project.

Many years ago, one November day, I changed my job to a large company. It immediately became clear to me that the work was in full swing. Half of the company worked to release the product before the end of the year. But after about a week it seemed to me that by the end of the year they would not have time. With each next day, the chances of success of this enterprise became more and more illusive ... The project was really delivered in December, though the very next year. I learned about this much later, because in the summer problems began with the payment of wages to employees and I quit along with about half the staff. You can say "well, of course, managers are fools, you had to play it safe." They insured themselves. Six months there were no problems with the payment of wages. Keeping a working capital stock for half a year of financing is not an easy task. I think

If we consider the investment in the assessment as an investment in the adoption of sound management decisions, then they cease to seem so expensive. Group size is another matter. Of course, it is not necessary to force the whole team to evaluate the entire amount of work. It is much more reasonable to divide the task into ~~modules~~ , ahem, micro-services and provide the teams with autonomy. And at a higher level, use the estimates obtained by each team to draw up a project plan. Which smoothly brings us to the topic of the next paragraph.

Dependency Layout, Gantt Charts

If programmers usually give assessments, then drawing up a project plan is the lot of middle managers. Remember, I wrote that these guys can be helped if one tool is used for decomposition and project management. Evaluation and calendar time are not the same thing. For example, to display a simple data table, you would need:

DB table
Backend code
Frontend Code

Performing tasks in this order is easiest from a technical point of view. However, in reality there are different specializations. A front-end specialist may be scheduled to free earlier. Instead of being idle, it’s more logical to start developing the UI by replacing the server request with mock or hardcoded data. Then by the time the API is ready, it remains only to replace the code with a call to a real method ... in theory. In practice, the maximum level of parallelism can be achieved as follows:

First we swagger quickly to agree on the API specification
Then hardcode the data on the back or at the front, depending on who is at hand.
At the same time, we do database, backend and front-end. The database and the backend partially block each other, but most often these competencies are combined in one person and the work actually goes sequentially: first a database, then a backend
We collect everything and test
We fix bugs and test again

It is important that steps 1, 4, and 5 are executed as quickly as possible to reduce the number of locks. In addition to technological limitations and restrictions on the availability of specialists of the necessary competence, there are still business priorities! And this means that after three weeks a demonstration has been scheduled for an important client and he wanted to spit on the first half of your project plan. He wants to see the final result, which will be available no earlier than two months later. Well, then you have to prepare a separate plan for this demonstration. We add to the plan to hammer in the necessary database data, insert new links for transitions to the UI, etc. It is also desirable that in the end it was necessary to throw out percent 20% of the code, and not all this demonstration.

Artistic cutting of such a plan is not an easy task. Building dependencies greatly simplifies the process. Before proceeding to the report module, you need to make a data input module. Is it logical? Add a dependency. Repeat for all related tasks. Believe me, many of the dependencies will come as a surprise to you.

In the tasks of automating business processes, one usually obtains several long “snakes” of related tasks with several large locking nodes. Most often, the initial plan is not effective in terms of resource utilization and / or too long in calendar terms. Revision of the assessment of labor costs will happen faster - not an option. The assessment, therefore, is most likely optimistic. We have to go back to decomposition, look for chains that are too long and add additional “forks” to increase the degree of parallelism. Thus, due to an increase in total labor costs (more people are working simultaneously on one project), the calendar period of the project is reduced. Remember the "mythical man-month"? It’s unlikely that a plan will shrink more than 30%. In order for the budget and deadline to agree, the plan can be reviewed several times.

Task lock

The first reason for blocking - dependencies - we have already considered. In addition, there may simply be incomprehensible / inaccurate requirements. A tool is needed to block tasks and ask questions. With the specification of requirements, you can unlock tasks and adjust the grade. This process, by the way, almost always goes on during the project, and not before it.

The critical path, risks ahead

The critical path method is based on determining the longest sequence of tasks from the beginning of the project to its completion, taking into account their relationship. Tasks that lie on the critical path (critical tasks) have a zero reserve of lead time, and, if their duration changes, the terms of the entire project change. In this regard, during the implementation of the project, critical tasks require more careful monitoring, in particular, timely identification of problems and risks that affect the timing of their implementation and, therefore, the timing of the project as a whole. In the process of project implementation, the critical path of the project may change, since when changing the duration of tasks, some of them may be on the critical path.

In short, if you mess up with the structure of the database, you have to rewrite the back, do not calculate the load, you may have to change the technology altogether. I wrote in detail about the risks of design work in the article " Cost Effective Code ". The sooner the risks on the critical path materialize, the better. After all, there is still time and something can be done. Even better if they do not materialize at all, but let's be realistic.

Therefore, you need to start with the most muddy, complex and unpleasant tasks, put them in the “blocked” status and clarify, overestimate and remove dependencies wherever possible.

Acceptance criteria, test cases

Natural language: Russian, English or Chinese - it doesn’t matter - it can be both redundant and inaccurate. Test cases overcome these limitations. It is also a good communication tool between developers, business users and the quality department.

Project management

Do you want to make God laugh? Tell him about your plans. Even if a miracle happened and you collected and clarified all the requirements before starting work, you have enough competent people, the plan allows you to do most of the work in parallel, you are still not immune from employee illnesses, errors in evaluating and materializing other risks. Therefore, it is necessary to regularly update the plan and compare it with the fact. And for this, accounting for working time is important.

Time tracking aka time tracking

Time and attendance has long been the de facto standard in the industry. It is highly desirable that it be produced in the same tool as the assessment. This allows you to track the deviation of the actual time spent from the estimated. It’s good if this tool is also used by the project manager. Then all delays of the critical path will be immediately noticeable. A variant with different tools is also possible, but it will require significantly greater labor costs for servicing the process, which means that there will be a temptation to fool around. We already know how this ends. We use YouTrack . Everything that I wrote about in the article is currently available out of the box, although it requires a little tweaking.

Conclusion

Evaluation is hard
Decomposition allows you to find gaps in requirements and improve the quality of assessment
Group scores are more accurate than individual ones, use poker
Blockers, test cases and formal acceptance criteria improve communication, which in turn increases the project's chances of success
You need to start with the most risky tasks on the critical path of the project
Evaluation is not a one-time action, but a process inseparable from project management
Without taking into account working time, it is impossible to keep the project status up-to-date and adjust your estimates

Want to know more about project evaluation?

Read the book of Steve McConnell " How much does a software project cost " and other articles on this subject on Habré:

Only registered users can participate in the survey. Please come in.

How is your company evaluated?

29.3% I guess this tune with 3 notes 22
16% No rating, deadlines are set 12
20% Evaluated by team leader or manager 15
34.6% Reviewed by team 26

Tags: