Development processes through the eyes of exploitation. A look on the other side of the barricade
Hi, Habr! And again, Alexey Pristavko.
Unlike my previous articles, today we will talk about people. The heroes of the day are the service of exploitation, whose interests I represent, and the service of development.
The coordinated work of these services is the key to a successful launch and even “flight” of the service being created. But, as my experience (and not only) shows, practically no project is complete without conflicts and disagreements, the victim of which is innocent service.
In this article I will try to answer the following questions:
- How do development methods and processes affect exploitation?
- What drives each side of the conflict?
- What is the root cause of disagreement?
Welcome under the cut!
What are the challenges facing services
We will get acquainted with the departments closer and start with the operation service (it’s a support service). Why do we need this terrible beast, what task does it perform and for whom does it work?
The main task of the operation service is to manage the level of services, that is, to maintain the operation of the IT system within the SLA.
Operation should provide constant access to the service and its correct functioning within the framework of an agreement with the customer - as a rule, with the business.
This is where the solution to this problem comes from:
- Incident management: recovery of business function in case of an accident;
- Problem Management: eliminate the likely causes of accidents and incidents;
- Configuration management: information gathering and analysis of software and infrastructure parameters;
- Change management: minimizing the risk of problems and accidents with changes.
The role of the development service is also clear - the initial creation of an IT service and the introduction of new functionality into it at the request of the customer.
Surely, you already have suspicions about the friction points of developers and support, because where there are intersections of tasks, there are conflicts. But do not rush to conclusions. The eternal debates of developers and admins are far from the epicenter of “battles”.
Where disagreements grow from
The “war” of programmers and admins is not a confrontation of human interests, it is a confrontation of services.
The problem lies in the priorities and motivation of these same services. In the most general form, this can be described as:
- The development team wants to use technologies of the first freshness (professional development, interest) and do the work as quickly as possible (extrinsic motivation).
- Operation is interested in the most stable technologies (their problems are known and have generally accepted solutions) and detailed explanations of what to do in case of problems in the developed software (external motivation for the speed of troubleshooting).
At the same time, it is important to understand that not only developers “saw” new functionality and not always administrators lift fallen.
- Admins are actively involved in the development process — in building infrastructure, fault tolerance and scalability schemes, in preparing initial configurations, and ideally in preparing software requirements. All this is called the solution development process (do not confuse with direct code writing!).
- Programmers are actively involved in the operation. They correct software errors, perform system optimizations and technical improvements to comply with the SLA system, that is, solve the problem of accessibility of the final IT service.
The conflict between programmers and admins grows from the substitution of concepts:
- development → writing code;
- support → administration of application services.
The distortion of the structure of subordination on a professional basis (and not on a functional one) always leads to conflicts. As a rule, administrators and programmers work in completely different teams, and sometimes departments, and are motivated as exploitation and development, respectively.
As a result, programmers from the development department who are “forced” to eliminate old bugs or, God forbid, write documentation, are dissatisfied. In the same way, the admins from exploitation are indignant, who are asked to quickly throw a grand piano to raise a new server for developers or to help them with advice.
Each side perceives it not as a joint solution to the problem, but as a distraction from its own tasks, which already cannot see the end.
But we must not forget about the customer, because he, albeit indirectly, is also a party to the conflict. As a rule, he needs to get an established and stable service. "Naked" code, even the best, out of the system for the customer is completely useless. He basically has a different picture of the world:
A standard customer doesn’t have a task to create something, write it or, sorry, God, introduce it. The customer wants to solve a business problem with the help of the magic button “Make it all good”, while IT
Let's take a look at all parties to the conflict:
Of course, this is only the most general part of the functional and operational requirements.
So, we found out that there are still three participants in the conflict :development, operation and customer. Moreover, it is the customer who in many respects acts as a “provocateur”, sharing responsibility between the teams. That in itself is not bad, and if it were not for the generally accepted division of teams and duties between them, it would be a controlled conflict.
But we already know that both services are not self-sufficient. They are divided by the “service” barricade and at the same time should not only interact with each other in the field of transfer of responsibility according to the stages of product life, but also actively participate in solving each other’s problems.
Another aspect of customer influence is its strategy and business development methodology. In this case, I'm talking about not understanding how the process of developing a specific IT solution looks like. Not having such an understanding, the customer often demands to make a cat a whale, and then attach wings to it. And always everything is urgent. Sometimes this is justified by the situation and the innovation of the idea, sometimes it is a consequence of the race for the market leader and a rush copy. The reasons may be very different, but the result is one.
The trouble is that this strategy results in constant experiments and the need to get results in the shortest possible time. This approach throws us into the abyss of continuous development instead of work aimed at a specific result. In principle, the last problem is solved by enterprise architects, but you can't find these guys in the afternoon with fire.
Finally, we almost got to what it was all about. What is key to service maintenance?
- Transparency of completed modifications. To manage change, you need to understand what, how and why it was done.
- Documentation on the logic of operation and maintenance. Ideally, in the form of instructions. The key to SLA compliance is not only reliability, but also a clear understanding of what and how to do, which should be present for all performers. “Verbal knowledge” is not suitable here - very often in operation people work in shifts, and it is simply impossible to gather everyone and explain everything. Yes, and memory in a stressful situation (and an accident - it is always stress) can fail.
- In-depth procedure for the transfer in support of the new version with testing of its performance characteristics and the correctness of the functionality. In a simple way - regression and load testing. The simple operability of the new functionality is of the least concern for the operation: the development team will repair the unsuccessful release of the warranty itself. But the introduced error in the old functionality should, if not eliminate itself, then at least process, sometimes prove the guilt of the development.
- The ability to transfer their requirements to the work of the new project. It is the development that provides the most important operational characteristics. For example, if the software does not know how to work in a cluster, the operation cannot independently make it really reliable.
What is the maintenance service and why it is important for someone in general, how the development team works, we figured it out. Let us turn to the most interesting - let us examine various development methodologies and their influence on the operation service.
Let's start with the classics: Waterfall models.
Waterfall is focused on delivering finished and developed functionality. The release model is cyclic. The cycle takes from several weeks (extremely rare) to quarters and half-years. Almost always there is a consistent collection of requirements, their analysis, development of the solution architecture, assessment of its duration, planning, full-fledged regression testing at the end.
Respecting the interests of exploitation depends on the specific implementation. Since the necessary stages are usually highlighted, the process involves consideration of all requirements and formal procedures for the transfer into operation, including documentation.
The main problems Waterfall for the customer are the duration of the iterations and long stabilization after the release. Sometimes a customer has to wait several months before a functional appears in the production unit, which can only be created in a week.
If the result is far from expected, the customer will have to suffer until the end of the new cycle, or even two. Regularly in his place is the service and maintenance. A technical functionality is often the last in the queue.
Each big release is accompanied by a bunch of errors, which is eliminated during the stabilization period. Usually it turns into hell for all parties - the development is forced to engage in exploitation, exploitation accepts incidents and “cram” all their efforts into development for warranty elimination, and the customer, looking at all this mess and lost money, tears his hair out.
Despite all this, in terms of exploitation,Waterfall is the most uniform and predictable methodological process that can be integrated into. In general, neither the cycle time nor the stabilization operation is particularly concerned. The more time between releases, the longer it will be possible to work quietly - and this is always a plus. In addition, when there is confidence that nothing will change for another six months, it is much easier to enter your work into the process.
Unfortunately, very often customers are set against Waterfall and they need to speed up project development. In order to please this desire, more flexible methodologies are being born.
As you understand, Waterfall is very long, formal, and entails tons of documentation,
● People and interaction are more important than processes and tools.
It's hard to argue with that. Of course, people are more important. But the processes also should not be forgotten. It is processes that standardize and regulate interaction. Besides, only in public you will not get far. At a minimum, people love to hurt, rest and sometimes quit.
● A working product is more important than comprehensive documentation.
This is also true. But there is another question: how well and for how long will the product work without documentation? Most likely, not very long. But whether it is good or not, it will not work out due to the lack of a primary source. And then you will have to follow the many tactics that have become fond of "I could not figure it out - rewrite it from scratch." But it is always long, expensive, and not at all the fact that the result will be better.
● Cooperation with the customer is more important than negotiating the terms of the contract.
And again, yes, more important. But how, without agreeing the terms of the contract, we will understand what and how to do? How else to overcome mutual misunderstanding, except how to communicate and negotiate through a clearly written document? Of course, the contract is not a panacea either, but it is much safer than the method from the nineties:
- Vasya, do you understand me?
- Well, like yes, brother.
● Ready for change is more important than following the original plan.
But this is true only in one case: if the original plan - a complete log and what they were going to build, turned out to be unnecessary. Give the enterprise-architect in every hands!
The result of blindly following this manifesto is that the business customer himself has to turn into an enterprise architect (but most often it turns into a pumpkin). Not only that it is not peculiar to him "functional", so also in IT it is necessary to understand.
Scrum is one of the first attempts to tailor Waterfall to the ideology of the Agile Manifesto.
The main features of the scram:
- Work in short sprints. The composition of the sprint after its start is not edited;
- Planning by placing a separate user story in the wish list of the sprint. On the project owner - a choice from the “project log”;
- The interests of the customer are the owner of the project (Product Owner);
- The development team consists of specialists of different profiles: programmers, developers, architects, analysts. The team is responsible for the result as a whole;
- We replace documentation and correspondence with daily discussions of the project by the whole team.
In theory, this is a fairly sound approach, and in many respects it resembles a miniature version of Waterfall. Problems begin to arise at the implementation stage. Unfortunately, SCRUM, due to the shorter cycle, provokes to break the holistic function into separate pieces and to deliver the function in parts even at the planning stage of the sprint. I am silent about the fact that for this reason, everything can go awry during the "race". Short sprints leave no time for managerial reserves and it becomes extremely difficult to get out.
As a result, the constant re-prioritization of the feature in the finished form can very soon get into the productive. In addition, at the stage of writing UserStory it is extremely difficult to assess the impact of the planned functionality on the final system, since its state is not known in advance at the moment when this functionality appears.
How does this threaten exploitation? It is not always clear what should work, what should not, and if it works, when. Accordingly, there is a substitution of testing the system for testing the function, since a normal regression will take a long time. This adds errors to the product and delays their decision.
For normal operation, the operation should:
- Participate in SCRUM-meetings;
- Constantly keep abreast of working situations in the development;
- Know development plans and release statuses.
Otherwise, it will be impossible to be in time with the acceptance or make comments. Of course, it’s extremely rare to follow all these points of exploitation, and if it does, it will lead to an escalation of the conflict. Even a product-owner during a SCRUM meeting may short-sightedly ignore the interests of support.
The next step in the evolution of development methodologies is Kanban. The already short SCRUM cycles also seemed to business too long. You can understand the customer: you always want to get the coolest chip first. Yes, and change plans is inefficient, it turns out a bit too much “overhead”.
So, as builders say, it's easier to "sculpt in place."
Kanban is the IT implementation of the Japanese lean manufacturing methodology. But there is a nuance: in Toyota, with the help of Kanban, cars are assembled from pre-engineered parts, while development is first and foremost design. In programming, part-functions are simply copied, they do not need to be constantly “produced”.
Kanban usually goes hand in hand with CI / CD processes. There are no sprints here, tasks are delivered continuously as soon as they are ready, there are strict restrictions on the size of such tasks. Because of this, complex functionality in a holistic form is almost never delivered, since it does not fit into the size of the task.
In such conditions, the documentation on the system really becomes outdated before it is written, and loses all meaning, since it is impossible to fix any state of the system being produced (namely, produced, and not developed) in which it would be correct.
For operation, this means the impossibility of providing SLA: there is no documentation, and no one knows how the system works and how it should work.
Predictability and a guarantee of continuous operation are the basis for the performance of an SLA.
However, with this approach, there is usually no operational service, only the development team, periodically distracted by repairs and (sometimes) by “technical debt”. But no one is worried because of this, as the plans do not fail. It is difficult to disrupt what is not.
As laid out by the authors of the original process, Kanban is ideal for operations, in which planning is replaced by a reaction to stimuli. For example, this is how the work with incidents looks like - what's broken is what we fix. In most cases, according to a known procedure. However, Kanban is completely unsuitable for development, as it poorly implies an end result. The choice of this methodology will doom you to an endless process - “we built, built and still build”. Do not do it this way!
Of course, DevOps is not a development methodology as such, but I can't help but insert my 5 kopecks and tell about the most common attempt to resolve a conflict of exploitation and development.
In theory, DevOps is a set of practices aimed at the active interaction of development specialists with information technology service specialists and the mutual integration of their work processes into each other.
DevOps is based on the idea of the close interdependence of software development and operation, and is aimed at helping organizations quickly create and update software products and services. Again, about the rapid creation and update. But for the exploitation of these problems does not exist at all.
As a result, DevOps becomes a means to make the development team independent of the maintenance service by completing the necessary link development. It is obvious that in itself this solution is one-sided and solves the problem only for one of the parties - development. Often this only exacerbates the conflict, allowing the development to completely ignore the maintenance service.
In practice, I most often meet two implementations:
- The team of admins is maintained and is engaged in production.
Administrators-automators appear in the development team, solving the problems of the developers instead of the operation team. I think it is obvious that as a result of their decision, quickly applied in test environments, I don’t want to let go of production, and the conflict is only increasing.
- The same thing, but the operation team is abolished in the spirit of the principle “no person - no problem”.
The tasks of managing the availability of the service go into the background. There are no motivated people at all. A team of DevOps engineers periodically appears, but this does not change anything, since the priority of their leadership is the release dates. All other tasks can be pushed until nobody sees.
Even if we go back to the original idea that DevOps is rather about testing, and it must guarantee prompt transfer of guaranteed qualitative changes, then again it turns out that it is impossible to exclude from the process or replace operation. The focus of the idea itself is precisely in ensuring the quality of the change process and guaranteeing the quality of these changes. Notice, nowhere does it say about SLA compliance and accessibility management. And the high quality of changes (component) does not guarantee the high efficiency of the system as a whole.
It is impossible to foresee everything and check everything - testing does not guarantee the absence of accidents. Equipment may wear out, the system may fail during operation in unexpected conditions. Finally, there is the human factor.
Therefore, even with the DevOps team, the operation process is indispensable, and, whatever one may say, documentation will be needed. When something breaks, you have to figure out how it worked at all.
Instead of output
So what to do and how to be?
Business: develop a coherent strategy for the development of your product and manage market conditions, and not rush to respond to changes.
Development: Do not turn the customer into an enterprise architect. Building engineering or IT systems is not his profile, he is an expert in some local area of business.
All together: attempts to get rid of the exploitation service may well be successful, but it is unlikely that you will be able to abandon the exploitation process itself, accessibility must remain under control. Either these will be managed by dedicated professionals, or it will be an additional burden for developers, you decide.
DevOps is an excellent and useful thing for improving the quality, but for the reasons described above, it also will not allow to guarantee anything and will not replace a full-fledged operational service. This approach will not solve the conflict of development, operation and business, but will help to ensure the process of continuous acceptance and reduce the overall stress.
Surely all of you somehow participated in the situations and processes described above. Share your experiences in the comments!