Transformation of development and delivery processes for legacy applications

Our team is responsible for the operation and development of a large corporate product.
In early 2017, having rested from a major introduction and re-reading "lessons learned", we firmly decided to revise the process of developing and delivering our application. We were worried about the low speed and quality of delivery, not allowing us to provide the level of service that our customers expect from us.

It was time to move from words to deeds - to change processes.

This article will briefly explain what we started with, what we did, what the situation is now, what difficulties we have encountered, what we had to leave behind brackets, what we are planning to do.

Start

Little about the system

The application is a classic example of a monolithic enterprise application of the "architectural spill of the 2000s":

Operated and developed over 15 years.
It is a set of one and a half dozen WinForms, windows services and ASP .Net applications tied to a single MS SQL database.
Codebase size: ~ 1MLOC for C #, ~ 9000 database objects. Much of the business logic is executed on the database side.
The application consists of ~ 250 + solutions for creating a win / web client (one solution per group of related forms). This is the legacy of the previous development process and client architecture.
The application supports several types of processes (clients) by changing the internal configuration: setting up processes, authorities, flexible fields, etc., in the system database configuration tables. At the same time, the application code base is the same for all clients.
The application is deployed and supported on 25+ sites (each site is an independent copy of the system) and serves a total of several thousand end users in different time zones.

Delivery process before transformation

development and assembly of the finished application and its components is carried out by the contracting organization.
the code was stored on the side of the contracting organization (local version of MS TFS). The code is transmitted to the customer on a monthly basis as an archive of the current version of the main repository branch.
Delivery was carried out by delivery of "delta updates": for the application (set of dll, exe, etc.) and database components (set of sql scripts create / alter). The application was assembled and the delta packages were prepared by the contractor.
the deployment process was supported by the transport system, the changes were applied automatically.

Delivery is carried out in the framework of monthly releases (as it is arranged with us, I told earlier here ).

Existing problems

Lack of control

despite the formal possession of the code, the actual assembly of the application by the customer was impossible.
as a result, it is impossible to make sure that the code transmitted to the customer is working.
changes in the code are not transparent to the customer. It is not possible to compare the requested and actual changes in the product.
code analysis is difficult for SQL, and impossible for C # components

Complexity and mistakes

Delta package preparation is a time-consuming procedure to develop, a source of errors and certain project costs.
Deploying an application from the delta package set requires tracking the order of the packages. Error in breaking package order is a major deployment problem and the source of a significant portion of incidents.
Regressions occur regularly: errors that seem to have already repaired and rolled out corrections to the product appeared again.

Restrictions

the ability to restore the state of the system at the moment in the past (roll back changes) is virtually absent.
the ability to effectively scale development resources and early testing by attracting the customer’s staff is virtually non-existent.

Expected results

At the beginning of the project, we set clear goals to solve the problems outlined above.

Move the code repository under the control of the customer
Move application build process to customer side
Modify the change propagation process by abandoning the "delta of change" in favor of a full update.

Additionally, using the solutions obtained when achieving the first two goals, we expected:

Improve the technical quality of the solutions obtained by monitoring the code
Increase the involvement and convenience of testing through the provision of self-service deployment.

Stages of the big way

Analysis of the current state of development processes

First step: analyze the existing contractor development process. This helped plan changes so that, if possible, do not interrupt work.

Unfortunately, familiarity with the development process showed that, in the understanding of the present-day IT industry, there was no process.

The database code and business logic to it was not maintained in the repository up to date. The main reason: the lack of tools that implement the assembly of the code in the repository and the deployment of the result. So, the code in the repository is just documentation.
The "real" version of the database code is in a common "development database", which has dozens of developers.
The client application code (C #, ASP.NET) was maintained in the repository, but the quality and timeliness of the commit is not guaranteed.
The assembly of components (not the entire application) was carried out at developer stations. It is not entirely clear how the code was updated before the build. The assembled component was laid out on a shared shared folder. From there, the "delta package" was formed for the customer.
The complete lack of practice of branches development. By indirect evidence, we suspected this a long time ago - but after immersion in the process everything became obvious.

Switch to a new repository and version control system

Dependence on MS platforms and corporate standards predetermined the choice of development environment - Team Foundation Server.
However, by the time we started the project directly (April 2017), a version of Visual Studio Team Services had just been released. The product seemed very interesting, was designated as a strategic direction for MS, offered git repositories, build and deployment for on-prem and cloud.

Corporate on-prem TFS was lagging behind the version and functionality of VSTS, migration to the new version was only in the process of discussion. We did not want to wait. We decided to go straight to VSTS, as it reduced our overhead costs for supporting the platform and gave us full control over how and what we were doing.

At the time of the change, the development team had experience with TFSVC, the application code was stored in such a repository. On the other hand, GIT has actually become a standard for the IT community long ago — the customer and third-party consultants recommended switching to this system.
We wanted the development team to be involved in making a decision on a new version control system, and made an informed choice.

We deployed two projects in VSTS with different repositories - TFSVC and GIT. A set of scenarios were identified that were proposed to test and evaluate the usability of each system.

Among the evaluated scenarios were:

Create a merge of branches
Organization of joint work (on the same or different branches)
Operations on chains of changes (commit, cancel)
Integration of third-party changes
The ability to continue to work when the server is unavailable.

As a result, as expected, GIT was chosen, and so far no one regretted it.

We started using GitFlow as a process. This process provided enough control over the changes and allowed delivery of releases, as we have become accustomed to.

We defended the develop branch with a policy that required all changes to go through pull requests.
We try to adhere to the practice of "one ticket - one pull-requack". Changes from different tickets are never combined within a single change. We try to do our utmost testing on the feature branch to avoid the situation with corrections in subsequent pull-requests.
When you merge into develop, all changes are merged into one commit (squash).
Release branches are created from develop.
If necessary, the latest changes can be added to the release branch selectively (cherry-pick) or all (rebase). We do not fix the fix directly in the release branch.
After deploying the last release to the product, it goes to the master via push force (only a few people have this right)

Product assembly automation

The application consisted of a large number of assemblies, hundreds of solutions. As it turned out during the audit process, all this was collected separately and "manually."
In the first stage, we decided not to redo everything “from scratch” (so as not to stop the existing delivery), but to “wrap” the assembly into the msbuild script set — one script per component.
Thus, we quickly obtained scenarios that carried out all the necessary intermediate artifacts, and in the end - the finished product.

A separate story is a database project. Unfortunately, the system contains several CLR components that were not well structured. Dependencies do not allow a simple database deployment. At the moment, this is solved by the pre-deployment script.
In addition, due to the uneven system landscape (SQL Server versions 2008 and 2014 were installed at different points), the project base assembly for .Net versions 2.0 and 4.0 had to be organized.

After all the scripts were ready and tested, they were used in the build VSTS scripts.

Immediately before the start of the assembly, the versions of all products were updated to a common standard number, including the pass-through build number. The same number was saved in the post-deployment script. Thus, all the components: the database and all client applications — came out consistent and equally numbered.

Deploying to a test bench

Once the primary version of the build process has been completed, we proceed to the preparation of the deployment scenario.

As expected, the database has caused the most trouble.

Deploying on top of a copy of the real database revealed many conflicts between the build and the state of real systems:

Unmatched versions in GIT and in the real system
DB schemas owned by users that were planned to be deleted.

Stabilization of the development process

This, of course, is strange to talk about and, even more so, to write here, but the most serious change for developers was the introduction of the principle “if this is not in git, this does not exist”. Previously, the code was commited "for reporting to the customer". Now - without this, it is impossible to deliver anything.

The hardest thing was with the database code. After moving to the database deployment from the repository, through the assembly and deployment using sqlpackage, the "delta" approach was replaced by the "desired state" approach. Packages were a thing of the past, everything had to be deployed automatically.

But! Until the full transition to the new deployment process, it was still necessary to deliver the changes. And it was necessary to do it in the old manner - "delta updates".

We were faced with the task of ensuring full and constant consistency of the state of the system upon delivery of delta packages, and the contents of the repository.

To do this, we organized the following process:

Regularly, the code from the repository was collected and deployed to an empty “model” database.
On the basis of the “model” base, a special autotest was prepared. For each object of the "model" database, checksums were calculated. The autotest contains all of these checksums and at startup it calculates the checksums of the corresponding objects of the "checked" database. Any discrepancy in the composition of the objects or their checksums leads to a drop in the test.
The "falling" test automatically prohibited the transfer of packets from the test environment further along the landscape. Such integration has already been implemented in the previous transport system.

Thus, with the help of automatic control, it was possible to relatively quickly bring the product database code to git in the current state and maintain it without additional efforts from the project team. At the same time, the developers began to get used to the need to correctly and promptly commit the code to the repository.

Product deployment to integration test environments

After the previous stage was completed, we proceeded directly to deploying the application on a test environment. We have completely stopped the application of delta packages to test systems and switched to automatic deployment using VSTS.

From that moment, the whole team began to receive the first fruits from the efforts expended earlier: the deployment took place without any additional efforts. The custom code is automatically built, deployed, and tested.

Unfortunately, as we understood later, the "alignment of the repository" carried out led to the fact that we had a version of the stably supported version of "develop", but the version of "production" was also not available. And so beyond the test environment - there was nothing to go to QAS and PRD with.

The application code on the database side could be compared with the productive one and understand the differences. There was nothing to compare client applications with - there was only the actual productive version in the form of a set of executable files, and from which they were assembled reliably it was impossible to say.

Product testing as a result of automatic build

After changing the approach to the assembly, the product had to be subjected to a large regression test. It was necessary to make sure that the application is running and nothing is lost.
When testing just got easier with the functionality placed on the side of the database. Fortunately, we had a significant set of autotests, covering critical areas.

But there were no tests for C # - so everything was checked by hand. It was a significant amount of work, and the check took some time.

"Leap of Faith" - pilot deployment to productive

Despite the testing, deploying to production was the first time scary.

We were lucky - we had just scheduled the next deployment of the system at the new site. And we decided to use this chance for a pilot deployment.
Users did not see, it was easy to fix possible errors of the assembly, the real productive work had not yet begun.

We deployed the system, and for several weeks it was in the mode of pre-productive use (low load, a certain pattern of use, which can be skipped in production). During this time, several defects were revealed during testing. They were corrected as they were found, and the new version immediately rolled out for checking.

After the official launch and the week of post-launch support, we announced that this is the first copy assembled and delivered "in a new way".

This version of the assembly became the first stable version of the master branch, it was hung with holiday tags "fisrt_deployment" (we didn’t order the badges with the commit hash, though).

Scale deployment across the entire production landscape

As James Bond used to say: "the second time is much easier." After the success of the pilot deployment, we quickly connected the remaining instances of systems of a similar type.

But the system has several types of use - one functionality can be used for one type, and not used in other cases. Accordingly, the functionality tested on the implementation of the first type did not necessarily guarantee success for other cases.

To test the functionality of the remaining types of use, we began to use active projects that were under development. The idea was similar and the first deployment - we began to use automatic assemblies, "slipping" them to users along with the project functionality. Thus, users, working with the "project" version of the product at the same time tested and the old functionality.

Scaling itself revealed unexpected technical problems:

Non-uniform system landscape
In addition to directly deploying the application, we had to first take care that everything was the same everywhere - .Net versions, Powershell and modules. It all took a fair amount of time.

Network connection
At some sites, the network connection simply did not allow all components of the assembly to be pumped. There were timeouts, damage in the process of transfer. A lot of things checked and tried - not very successfully.

I had to dwell on the following solution: the build script was finalized so that all the results were packed into one large archive, which was then cut into small fragments (2 MB each). We finalized the deployment scenario to eliminate concurrency when downloading artifacts, took all 2 megabyte fragments and restored from them what is already possible to deploy.

Conflict with antivirus
Another strange problem we faced is antivirus software conflict and one of the deployment steps: when any “suspicious” files, such as .js, .dll, are extracted from artifact archives, the antivirus starts to stare at them. And the strangest thing is that the antivirus starts to rush to the file even before the end of the unpacking and the unpacking process drops with the message “the file is busy by another process”. While we are struggling with this, excluding the local folder with artifacts from scanning is not very good, but nothing else has been invented.

Process improvement

After stabilization of the assembly and deployment processes, we switched to “sewing shoes for shoemakers” - improving internal processes.

built the basic integration of the corporate IT support system (service-now.com) with VSTS for transferring tickets in the form of Work Items. We have corrected the merge policy in develop - now we will definitely require the presence of a connected ticket.
CI connected to all feature branches. As soon as the code change is published - the version is automatically compiled and ready for testing.
on the side of the contracting organization, they deployed a large set of test benches and organized "self-service" deployment of the assemblies for internal testing
on the customer side, several test benches were also deployed - for each type of system. Any developer or employee of a department can initiate a deployment, the deployment permission is given by the employee responsible for the type of system and coordinating testing.
organized the infrastructure for working on projects: special naming of project branches, CI / CD for dedicated instances of the system with which developers, analysts and end business users work
consolidated some checks in the form of pull requests policies (more in a separate article )
integrated storage of auto-tests into a common repository (previously tests were stored separately) and build and deploy processes. Now the product build contains code and tests — both for the development branches and for the product as a whole.
as an experiment, VSTS agents were installed on the local computers of analysts who are engaged in testing and deploying the client part there, which needs to be tested for assembly.

Results

Current situation

All application code is stored and maintained in MS VisualStudio Team Services (more recently, Azure Devops) managed by the client. Version Control Systems - GIT
All changes are completely transparent and linked to service tickets (incidents / changes)
Due to the transition to git / GitFlow, parallel development has been greatly simplified.
A code review procedure was introduced with the participation of a customer representative.
The application is built by the CI system. Full assembly is carried out for both the main and feature branches, which allows you to organize early testing.
Deploying the application to all target nodes is based on scripts. Gradually, the configuration steps for the base components and applications are added to the application deployment steps.
Employees of the customer’s IT department can initiate the deployment of assemblies (final or intermediate) to test environments - for testing or early familiarization. Business users also have access to test environments.
we still adhere to the main release cycles of 1 month. Although in the last month we are rolling out something practical weekly.
there was a process of "experimental" version rollout on some sites.

Time by stages

No	Stage description	Duration
one	From the beginning of the project - to full control over the code, the process of assembly and delivery to the test environment	6 months
2	From the first deployment to the test environment - to the first pilot release on production	3 months
3	From pilot deployment to production — until first release for all instances	5 months

Total duration - 14 months

The duration, especially at the final stage, was largely determined by the coordination, and the agreed calendar of system maintenance.

Labor costs

The total cost of the involved employees of the customer and the contractor for all work related to the change is approximately 250 people * days.

Tags: