Veeam's internal kitchen: how the R & D process works

Evening. The next R & D interview is coming to an end, and our interviewers are tuned to unexpected questions from a future colleague. But no surprises: the ratio derived by Wilfredo Pareto works here. In 80% of cases we hear four questions - approximately 20% of the total number received. How is your process? What will I do? How to become a senior / timlid for a year? What about relocation to Europe?

In this post we will answer the first question and tell you about the development process in Veeam - from team to team this answer changes the least.

So the process. This is a repeatable sequence of actions leading to the achievement of success day after day, or at least sometimes. You have learned how to cook borscht and every time it turns out equally tasty - a process. Park not knock - mastered the process. The process allows the brain not to think about the routine every time, turning it into mechanical work. The brain is released for creativity.

The development process is a sequence of actions that turn users' desires into tangible products. These desires are formulated by analysts and product managers, realized by developers, critically evaluated by testers, described by recorders.

We in Veeam make mass products for backup and replication of data centers - so that nothing is lost. Classic boxed product without a specific customer. At first glance, the thing is simple, but there are nuances, therefore we have been doing the second decade already.

Characters

Each release is the result of the work of several groups:

Product managers, or analysts . They prioritize their work, communicating with the outside world - customers and partners. Partnership can be technological. For example, a distributor can tell what is missing to increase sales, and the manufacturer of a hypervisor can tell about future plans. For this team, it is important to “speak”, the ability to catch and prioritize trends in a turbulent stream of opinions. And then defend the selected position, say "no", explain why something is done that way, and not otherwise. It doesn't matter in press releases, at conferences or in private. These people teach the world of sales.
Technical support service . Helpline of our customers. The most important indicators of the team are the reaction time to the problem and the time to resolve it (SLA). About a few thousand calls are served within a month. The team is multi-layered, includes both customer interaction groups, and a group of analytics requests, workaround solutions, etc. Based on the information received by the support, a list of improvements is formulated and the urgency is whether to implement it in a private fix, the next update or postpone it to a major release.
R & D developers . People who materialize requests to code.
Testers, or QA . Pioneers, tank ground and shaker at the same time. They not only check what is already implemented - but also get involved in the work even at the idea stage. Repeating the tasks of administrators in the infrastructure close to the combat, it is easier to understand how convenient the created interface or the selected algorithms are productive. When technical support comes to the conclusion that there is a defect in the product, QA reproduces the problem scenarios and controls the fixes.
Team of technical writers. They create end-user documentation, as well as specific documents like “How it works” and “Deployment guide”. Material for the work they receive from developers, testers and analysts.

Some teams prefer open space, but more often - the

teams ' offices are connected through the requirements accounting system. We implemented it based on the Microsoft Team Foundation System, since historically we used it as a source version storage system. The system stores requirements (requirements), defects (bugs) and contacting support (issues) requiring the participation of QA and developers. Each issue involves hundreds of participants who work on thousands of tasks, requirements and defects. The system helps to keep all this and, more importantly, to evenly distribute the load, to prioritize developers.

Growth rings: development cycles

The development of our product is cyclical, each cycle ends with the release of the next version - release. Here is what is reflected in the releases:

Lingering trends in the market . For example, virtualization and the emergence of cloud infrastructures. Changing the IT work paradigm takes years - at this time, users are moving from suspicion and denial (“what the fuck”) to mass recognition (“yes, everyone does that”). Data center virtualization at one time spawned Veeam, as in the conditions of virtualization old products for backing up machines were ineffective.
Support for new platforms . Once upon a time Veeam was intended only for virtualized data centers on the VMWare platform. With the growing number and size of customers, the need has arisen to support new platforms. Other hypervisors (Hyper-V), physical servers, cloud platforms (AWS, Azure), etc. appeared in the appendage to VMWare.
Tactical changes in the market . Available in next versions of operating systems and hypervisors. Accumulates experience using previous versions of our product. For example, this is how we got item-level support - selective recovery from popular application servers, such as Microsoft Exchange, Microsoft SQL Server, Oracle Databases, etc.
Defects . Despite all our efforts, life no, no, yes, and surprises. Of course, we try to keep them to a minimum.

Every quarter we have updates (updates), about once a year - major (major) releases. They are good in that they minimize the overhead of creating bulk functionality associated with supporting new platforms and changing paradigms. Based on the characteristics of budgeting, the IT departments of customers are most active at the end of the year, so we also roll out large releases at this time.

Quarterly updates mainly have two goals - support for new versions of protected servers and the elimination of defects. In the updates we collect all the fixes of defects found in clients since the release of the major version. Also, with the help of updates, we promptly respond to changes in supported platforms. Under the terms of SLA, Veeam must add support for the new version of the hypervisor in no more than three months..
And what if the product does not work for a specific client? In this case, we issue a private fix - in other words, a crutch. Such fixes are released every week and are not always associated with defects in the product. For example, customer security settings may not be compatible with the product. By releasing a private fix, we analyze the frequency of the problem and decide whether to include the fix in a subsequent quarterly update.

From dawn to dusk: the release chronicle

Each release cycle begins with planning - at the level of the product as a whole and at the level of an individual requirement. In the first case, the issue of business priorities and the distribution of requirements between teams. First of all, the most urgent requirements or epics are discussed. In a good way, no more than 60% of the total work on the release should be spent on epics, so that there is a time pillow. Product planning is carried out at the level of department heads - products, testers, developers.

Developers and testers are divided into teams. The optimal number of people in a team is five. Teams are both functional and universal. In the latter case, the team is self-sufficient, contains developers with expertise in several areas. Functional commands are more focused — they work on the user interface, system components, etc. People from different functional teams form virtual teams that begin to implement the requirements. Here, at least, representatives of the PM group, development and testing teams gather. Responsible for the requirement is assigned to the team of one of the functional teams.

Work begins on the requirement. Virtual team meets weekly. Participants talk about the successes of the past week and plan work for the next.
Responsible team leader moderates meetings and records the results. He also resolves issues that cannot be solved within the virtual team. For example, if you need to move deadlines or postpone some of the tasks.

Inside the functional development or testing teams, the control points are arranged more tightly. As a rule, the weekly plan is divided into tasks with a duration of no more than two or three days. In some teams, scram-practices with daily volatiles got accustomed, the point-to-point interaction between the team leader and the team works more efficiently somewhere.

Typical negotiation to discuss current project status

All development is divided into weekly or two-week iterations. During the first iterations, you need to create a minimally functional functionality that will later become overgrown with meat - for example, an expanded user interface, an API for clients, etc. And most importantly, the presence of a skeleton already allows testers to get a feature.

The release cycle includes alpha and beta releases. With their help, we show new features to the outside world and collect feedback in advance. If necessary, change solutions for architecture or functionality. The alpha and beta scripts are not brought to the state immediately, but in batches. As a rule, in the release cycle there are about two bets.

After the beta stage, the teams go into regression testing mode. At this stage, the product, in general, is already working, the user interface and the work scenarios have already hardened and are changing with less intensity. A team of technical writers comes into play. At the same time, technical support teams are being trained.

Regression testing is conducted in two-week cycles. The cycle duration is determined by the time required to view all product scenarios. The larger the product, the more scenarios and, accordingly, cycles. Before the last loop, the codelock is declared. This means that only critical changes will be made to the product - and only after numerous code reviews. Such draconian methods are needed in order not to accidentally introduce new defects into the product.

The closer the release moment, the more free time developers have and less - all the others. Product managers need to prepare press releases, coordinate marketing services. Testers should check for fixes and implement final regression testing. Technical writers append user documentation. At this blessed time, developers are expanding research activities to the requirements of the next version.

And here it is an exciting moment called RTM - Ready To Market. This means that the product is already ready for transfer to consumers. For two weeks, he will be tormented by journalists, service providers. It will be shown on presentations. After two weeks, the product will be available to everyone. At this time, there are also internal changes: new branches of development are being prepared, code is being deposited. And, of course, the build infrastructure rises under the next version. After the public release (GA), it's a hot time for the technical support service. And the rest is already working on the next version.

About priorities

And finally, a little materiel. As you know, in the trinity “quickly, efficiently, inexpensively” you can choose a maximum of two options. Quality, timing and functionality are constantly fighting among themselves. In our product box, quality comes first. Hm, but what is there any area where quality doesn't matter? Of course not. The whole question in determining the quality.
For us, quality is:

Maintaining reliability and performance in zoo configurations . One client has a modest data center of two servers from the time of the Battle of Borodino, and another has a high-end infrastructure in a neighboring hangar with Amazon. The product should work adequately in both cases.
Ease of use . The user must strain to a minimum and certainly cope without any instructions. But behind the outer simplicity, the simple code is far from being always hidden — try to cross the grass with the hedgehog seamlessly.
Inheritability . Investments from enterprises are perennial, and financial directors will not spend money on IT without a good reason. So you need to maintain compatibility with previous versions, and with related products. Often, when data centers are rebuilt, mail servers of the 80s era are bricked into the wall. And they all buzz and die do not think.

With such a set of priorities to preserve quality, you always need to combine something, both for developers and testers. Small changes in the behavior of functions can lead to forced integration retesting of the lion's share of the product. Try to add support for Asian locales to the product and understand what this is about. Therefore, the question of priorities is a question of joint discussion of PMs, testers and developers.

The second, almost indestructible priority is timing. In the case of updates, the release dates are set by the SLA, in the case of large releases, by the business calendar. According to statistics, in each release cycle, almost 50% of the time is spent on development, 50% on bringing the product to mind (the bugfix stage).

What can change is the functionality of the next release. Here helps a prioritized list of requirements, or backlog. Theoretically, everything is simple: choose from the backlog the next priority function, look for the remaining time. When the time is close to the outcome, stop and release the next version of the product. The devil is hidden in the nuances:

Uncertainty of requirements . For example, the requirement “to support the backup of physical machines on OS Linux” can later be greatly refined. What kernels should be supported? What are the distros? What are file systems? The same high-level requirement can be realized both in a month and in a year. The question is complete.
Teams have specializations . Not any requirement can be taken by any team. C # -developer will not write drivers, the developer of system components will not always cope with web development.
Requirements depend on each other . This is not always visible at the level of user scripts, but there are such connections inside. From the perspective of the outside world, backup support from NTFS and ExtFS file systems may be requirements with different priorities, but inside you will first need to write a common engine.
Requirements are divided into deferred and non-deferred . If the market is waiting in the next version of some function, and it was announced, then postpone it will not work.
Part of the requirements involves research work . Without the results of research work, it is impossible to plan the complexity of the task (maybe it is impossible at all), and it is difficult to predict these results.

This is where flexible development comes into play. For us, flexible development means the need for periodic re-planning. New circumstances became known - changed plans. New priority requirements were added to backlog - changed plans. We do not have time with non-deferrable requirements - we cut some of the tasks or change the requirement. In technical control theory, this is called a feedback system. Remember how the autopilot works.

Any planning under the conditions above is based on expert judgment. The expert assessment of the demand responsible Timlid is the element that makes the whole subsequent process clear and structured. Another name for peer review is “Lenin’s squint method,” as Alexander Orlov likes to repeat from Stratoplan. This is when you make a decision based on previous experience and intuition. Despite possible criticism, it works. It works like all our processes described above. If you have any questions about them, we invite you to comment.

What's next?

The current process technology is comfortable and cozy as slippers. The only problem is that in Veeam some invisible awl always drives: faster, more, more, more.
Recently we have built pilot offices in Finland and the Czech Republic, and this year we are preparing for the large opening of the Prague R & D center for several hundred people.

Lobby of our Prague office

A development office has recently appeared in Israel, teams are growing in Canada and Germany. The number of joint development projects with HP, NetApp, Nutanix, EMC is increasing.
Manufactory turns into a geographically distributed conveyor, and at the same time new processes crystallize. However, this is a topic for a separate article.
Stay connected.

Tags: