Windows problem is not in the frequency of updates, but in the development process

Original author: Peter Bright
  • Transfer
The glitchy updates point to the deeper problem of

Windows 10 at the presentation in Tokyo, July 2015.

Obviously, the Windows update of October 10, 2018 was not the most successful. Messages quickly appeared about the loss of files on computers, and Microsoft suspended the distribution of updates. Since then, the bug has been fixed , now a new update is being tested before it is released again.

This is not the first Windows update to have problems - in previous updates we have seen things like significant hardware incompatibilities. - but it was clearly the worst. Most of us know about backups, but in reality a lot of data, especially on home computers, does not have a backup, and their disappearance is quite unpleasant.

Windows as a service

In Windows 10, it was planned to radically change the development process. Microsoft wanted to respond better to the needs of customers and the market - and more often to release new features. Windows 10 was introduced as the “latest” version of Windows. They say that all new developments will be updates to Windows 10, delivered through the update system several times a year. This new development model is called "Windows as a service." And after some initial confusion, Microsoft stopped at two functional updates per year : in April and in October.

These efforts have been crowned with success. Microsoft releases useful new features for the new model, without forcing users to wait three years to upgrade the main version. For example, a handy feature came out for the invisible launch of Edge in a virtual machine, providing great protection against malicious websites. The Windows subsystem for Linux (WSL), which allows for natively launching Linux programs on a Windows system, has been helpful for developers and administrators. The benefits for average users are a bit more difficult to notice, but we can mention VR functions compatible with SteamVR , improved game performance and a dark theme . Although the improvements are not so significant, in general, the current Windows 10 is certainly better than the one that came out three years ago.

On days when Windows was updated only every three years, it was hard to imagine that WSL would ever be a useful tool.

All this is good, and some things could hardly be done (at least as successfully) without introducing the “Windows as a service” model. The development of WSL, for example, was based on user feedback. Users talked about incompatibilities and helped to prioritize the development of new features. I do not think that WSL would have had such an impetus to develop without sustained update progress every six months — no one would want to wait three years to see a minor fix. For example, so that the package important for them began to work correctly. Regular updates reward people for reporting errors, because everyone really sees that errors are fixed in a timely manner.

The problem of the “Windows as a service” model is quality. Previous problems with this model and security updates have already undermined the credibility of the Windows 10 update policy. Experienced users have heard that the quality of monthly security updates has decreased in Windows 10, and installing semi-annual feature updates as soon as they come out is insane . These complaints also have a long history. Unreliable updates were cause for concern soon after the release of Windows 10.

The last problem with deleting files led experts to argue that there may be too many two feature updates per year . Redmond should reduce their number to one, and Microsoft shouldsuspend the development of new features and simply correct errors . Some people worry that the company is dangerously close to a serious loss of confidence in the updates, and some Windows users have already lost this trust.

These are not the first calls for Microsoft to slow down the update of functions - there were fears that it’s hard to digest both IT departments and ordinary users, but with the obvious problems of the last update, these calls become especially relevant.

It's not about frequency, but about HOW updates are being made.

But those who insist on one update per year instead of two, miss the point. The problem is not the frequency of the output. The problem is in the Microsoft development process.

Why is the problem not in time? We can look at the graphics update other OS.

On the one hand, macOS, iOS and Android are updated less frequently, so it may appear that Microsoft is too zealous. On the other hand, there are precedents for successful updates with this frequency: for Ubuntu, there are two releases a year, and Google’s ChromeOS, like its Chrome browser, gets updates every six weeks. If you go beyond the OS, then Microsoft Office Insider runs a monthly channel, where new features for Office users go out every month - and developers do the job without causing too many complaints and still providingsteady stream of new features and fixes. The Visual Studio team also frequently releases updates for its development environment and online services. Obviously, Microsoft has teams that are well adapted to reality, where their applications are updated regularly.

Let's go beyond the scope of local software and look at the network and cloud services. There, Microsoft and other companies are increasingly introducing continuous delivery . Each update in the system is automatically deployed on production servers after passing a sufficient number of automated tests.

Of course, none of these projects compare in complexity to Windows. Ubuntu may have a more diverse array of packages, but in any case, many of them are developed independently. The fact remains: the scale of Windows is unusually large - and the components are unprecedentedly integrated into a single code base. Mostly the Windows code is extremely old.

Of course, these factors complicate the development of Windows, but is it really impossible to master two updates a year? That's not the point. Just need the right development process. Windows 10 around the time of release in 2015 (Where are all my icons and Start menu?)

Process rooted in history

Microsoft does not disclose specific details of the Windows 10 development process, but there are observable characteristics of this process (how new functions are sent to insiders, types of errors in insider builds) and there is some information from inside the company. All these circumstantial facts testify to the flawed development process. He retained some of the common features with the development process that the company used during the three-year Windows releases. The dates have been reduced, but the basic approach has not changed.

In the old days, when between two large releases took place between two and three years, Microsoft came up with a process divided into several stages:

  1. design and planning;
  2. component development;
  3. integration;
  4. stabilization.

Approximately 4-6 months of planning and design, 6-8 weeks of intensive coding, and then 4 months of integration (each function is usually developed in its own branch, so they all need to be assembled and combined) and stabilization (testing and error correction). During product development, this cycle is repeated two or three times; for Windows there will be three iterations, the first of which is a prototype, and the next two are real. The duration of the phases may vary, but the basic structure is widely used in the company.

Some things are obvious from this process. Probably the most striking thing is that surprisingly little time is spent directly on the development of a new code: for the release of Windows, these are two intervals of 6-8 weeks over the entire three-year period. A lot of time passes between the planning / design stage and the actual product. This is the main factor why this process cannot be described as “flexible development”: new functions are firmly embedded in the final product, so it is difficult to change them in response to feedback.

Separating development and error correction from one another is also a problem: during the development and integration stages, the reliability and stability of the software is very poor. Integrated functions are not fundamentally tested (because testing occurs later) and are never used with each other (because they were all developed separately in their own branches prior to the integration phase). Software confusion is then brought to an acceptable level through testing, error messages and error correction during the long stabilization phase. In this process, you need to repeatedly improve the reliability of the product.

Nadella presents to the world in Windows 10 in 2015

New world is not so new

In the new world, we see that a company may need seven or eight months to complete a cycle. Although between releases only six months, the beginning of the next cycle takes place before the completion of the previous one - for insiders this is obvious from the opening of the Skip Ahead group.

As a rule, each update begins with a fairly quiet period with several visible changes, and then comes a few months with the introduction of large changes - and a huge number of bugs. About a month before the release of the update, we see a sharp slowdown in the number of changes made and a strong focus on bug fixes, rather than on new features.

As Microsoft employees themselves describe, the last few months of development include the “tell” phase, then one month of the “ask permissions” phase. During the “notify” phase, Windows executives are informed of the changes being made with the policy for accepting these changes by default. At the “ask for permission” stage, only really significant changes are allowed, as a rule, only a few changes per day.

For example, the first build of the October update (codename RS5) was released for insiders on February 14, and a stable build of the April update (RS4) occurred two months later on April 16. RS5 did not receive any significant new features until March 7th. Many functions were added during May, June and July, and then in August and September only minor changes were made. Several small functions were even removed in August, as they were difficult to prepare for the release in October.

Of course, the process has changed a bit. For example, new features appear in preliminary builds for many months. This indicates that the integration of new functions seems to occur much earlier - as the functions are developed, and not in one large merge package at the end.

Quality decline

But there are key similarities. The most important thing is that a deliberately buggy code is integrated into a common base, and the phase of testing and stabilization is used to solve any problems. This point is even recognized explicitly: when announcing a new pre-build , Microsoft warns: “As usual at the beginning of the development cycle, assemblies may contain bugs that are painful. If this causes you inconvenience, you may consider switching to a slow update cycle (Slow ring). There, the builds will continue to be of higher quality. ”

We see this in practice in the RS5. Last October, with the update, a new feature was introduced for OneDrive: icons(placeholders) that display files in the cloud storage that are not uploaded locally. Whenever an application tries to open a file, OneDrive transparently retrieves the file from the cloud storage and saves it locally, and the application does not even know that the file was not initially available locally. In the RS5 build , the cleanup function of replicated cloud files from local storage was implemented if disk space runs out.

This is a really smart, useful feature that improves integration with cloud storage. This and the new code; there is a kernel driver that provides a link between the cloud sync code (used to download files and download changes) and icons in the file system. There is also an API (it seems that third parties can also use this function for their synchronization services).

The preliminary Windows builds use a green “screen of death” instead of blue, so they are easy to distinguish.

It is reasonable to assume that Microsoft will make a test suite for this new code: creating a file, checking synchronization, deleting a local copy while preserving the icon, opening the icon with downloading a real file, completely deleting the file, and so on, and so on. There are several basic operations for manipulating files and directories, and in any flexible development process there are tests to check that all operations work as expected, and the API does what it has to do.

In addition, it was reasonable to assume that any code change that breaks down the tests will be rejected and not allowed for integration. The code must be corrected, it must pass its tests before it ever gets into the main Windows branch, and even more so it is sent to beta testers.

And yet, in many preliminary builds there was a bug: the system hung up when deleting a directory that was synchronized with OneDrive. This bug was not only integrated into the Windows code, but also released for end users.

Test the software before the release, not after

It speaks about some fundamental principles of Windows development. Either there are no tests for this code at all (I was told that yes, it is allowed to integrate code without tests, although I hope this is not the norm), or test failures are considered acceptable, non-blocking problems, and developers are allowed to integrate code that is not known working properly. Outside, we cannot say exactly what is happening - perhaps even a combination of both approaches - but none of them can be called good.

For older parts of Windows, this can be considered more forgivable - after all, they were developed before the era of automated testing, and there really can be no tests. But the OneDrive icons are not the old part of Windows, this is a completely new feature. There are no good reasons why there is no solid set of tests for the new code to test the basic functionality. And a known defective code definitely cannot be included in the code base until it is fixed, let alone sent to testers.

As a result, the development of Windows 10 still follows the old principles. New features are poured into the database with the degradation of stability and reliability. It is assumed that the code base will be brought to an acceptable level during the testing and stabilization phase.

Inadequate automated testing and / or ignoring testing errors, in turn, means that Windows developers cannot be sure that changes and corrections will not cause ripple effects. That's where the “ask for permissions” development phase came from: as the update completes, the number of changes needs to be minimized, because Microsoft is not sure that the scope and impact of each change is isolated. This confidence comes only with a massive, disciplined testing infrastructure: you know that change is safe because all tests pass successfully. Regardless of the type of testing the company conducts for Windows, it is clearly not enough to gain such confidence.

But in other respects, Microsoft acts as if it has this confidence. The company has a lot of tests; I was told that the full cycle of testing for Windows takes many weeks. This full cycle is really carried out, just not on the assemblies that actually go into production. An example is the update of October 2018: the code was compiled on September 15, and on October 2, the assembly became public. Regardless of which RS5 build goes through the full test cycle, this is not the one we actually got, because the full test cycle takes too much time.

This is a controversial position. If subsequent changes to the code are made with a high degree of confidence that they have not broken anything, you can run a full test cycle on the previous build. But if Microsoft has such a high certainty that these changes will not break anything, it would not have to stifle them so much at the “ask for permission” stage.

Windows 10 can really work as a reliable machine.

How to do it right

We see a significant difference with real Agile projects. For example, the development process that Google uses for its ad dispatch server. This is a critical part of the infrastructure for the company, but new developers in the company describe that they made changes to close a small mistake - and the changes went into production during the day. When a fix is ​​submitted to the repository, it is automatically rebuilt and subjected to a battery of tests. The maintainer of this section of the code then considers the change, accepts it and combines it with the main code base, which is retested and deposited in production.

Of course, it is a little unfair to compare this with Windows: after all, for cloud services it is much easier to roll back the change when an error is detected. Changing Windows with a blue screen when the system boots is much harder to undo and restore. But still, ad server is a critical service of Google, the company earns money on it, after all, and an unsuccessful update can easily cause a loss of millions of dollars. But automated tests and the entire streamlined process allow even an intern in the company to deploy their changes to production within a few hours and do it with confidence.

The development mentality is fundamentally different. A new feature may be unstable during development, but when added to the main branch it should correspond to a high level of quality. The Microsoft approach is “to merge all the errors now, we will correct them later,” and in the correct process, they get rid of the bugs before they add the new function to the main branch.

If you take the version of Chrome from the developer channel, then usually the only evidence that you have not the final release is a non-standard icon

While cloud applications allow for some flexibility, Agile methods are also suitable for desktop software. For example, Google has established similar processes for Chrome. In beta versions of Chrome there are rare bugs, but in general, their code is close to release quality. Indeed, the principle of Chrome development is that even the most recent build should be of release quality. You can use the dev-version of Chrome as a normal browser - and only by the icon you will understand that this is an alpha, and not a stable channel. This is possible thanks to the extensive automation of tests and verification: throughout the entire development process, Chrome code is high-quality, without a drop in quality, with subsequent edits that we see in Windows.

For this, Google has invested in infrastructure. It runs a distributed build system that builds Chrome in parallel on a thousand cores, so a complete build can be completed in just a few minutes. Disciplined branches are established to make merging easy and predictable. Google has a wide range of functional tests and performance tests to identify errors and regress as early as possible. None of this is free, but it’s important for Google to release Chrome releases on a sustainable, regular basis.

Windows development has always been bad

In the new development process, Microsoft proportionately spends more time on writing new functions and less on stabilizing and correcting these functions. Everything would be fine if the quality of the functions were high from the very beginning, with the testing infrastructure and higher standards before integrating the new code. But the experience with Windows 10 so far is that Microsoft has not developed the processes and systems necessary to support this new approach.

The problem of reducing the number of issues from two to one per year will not fix the problem. It often seems to me that people look at the old development of Windows through rose-colored glasses. But if we recall the times of Windows 7 and earlier, we will see very similar problems. And then the usual advice was not to upgrade to a new version of Windows until Service Pack 1 was released. Why? Because the first release has always been unacceptably buggy and unstable - and it was better to wait until Service Pack 1 solves most of these problems.

The difference is not that the new approach to developing Windows is much worse than before, or that the old process gave better results. Not at all. Now we see the same thing as then, only the moment “wait for Service Pack 1” comes twice a year. After each update, a moment comes when Microsoft believes that the code is good enough for corporate users - usually three to four months after the initial release. This is our “new” Service Pack 1.

Thus, we get the worst of both worlds: there are still bad releases from the old approach to developing Windows that need to be fixed. From the new approach - releases twice a year, not once every three years. Thus, instability until the release of the Service Pack persists for most of the year.

The main drawback is the destabilization of the code base by integrating inadequately tested functions, hoping to fix everything later. This is a bad process. It was bad at the time when Windows was released every three years, and it is bad when it is released every six months.

This is not an insider job.

The second problem is the nature of the tests. Microsoft used to have a huge number of testers, and assigned a certain number of developers and testers to each function. Many of them were fired or transferred to other positions in 2014. The idea was that more testing responsibilities would be placed on developers creating these functions. The Windows Insider program also provides a large amount of informal testing - with many millions of participants (insiders). This is much more than in any of the previous Windows beta programs.

I'm not sure that the old approach would necessarily find a bug with data loss. Perhaps professional testers would not test a specific scenario in which data is deleted. But it is clear that Microsoft cannot cope with the flow of error messages from insiders. Data loss was recorded three months before the update. Many error messages seem to be of poor quality, they lack the necessary details or the correct terminology, but if the company did not notice the problem within three months, then it is not at all obvious that a longer development period matters. A longer development period would mean that the error was ignored for six months, not three.

Microsoft has promised to change the feedback process with insiders so that they indicate the seriousness of the problem and draw more attention to such problems. This may help if insiders use the severity indicators correctly, but it seems insufficient to solve the main problem: too many error reports of too low quality.

This is a code quality issue. The real strength of the Insider program is a variety of hardware and software. Insiders can reveal the most exotic compatibility bugs, driver problems, and so on. But insiders should not perform basic function testing. But often there is a feeling that Microsoft uses them as full-time testers.

Moreover, if the quality of the code falls during development, pre-assemblies are usually not suitable for everyday use on conventional PCs. They are not reliable enough. In turn, this undermines the value of testing by insiders: they do not install them on the main PC and do not subject the assembly to the full range of hardware and software. They use secondary PCs and virtual machines.

You have to invest in your tools.

Developing a testing infrastructure like Chrome for a giant project like Windows is a very serious task. While some parts of Windows allow offline testing, others can only be effectively tested on an integrated, integrated system. Some of them, such as the OneDrive file sync feature, even depend on external network services. This is not a trivial task at all.

A huge change would be the acceptance of the principle that Windows code should at any time ensure release quality - not “after several months of correction”, but “right now, at any moment”. But this is a necessary condition. Microsoft should achieve a situation where every new update has a release quality from day one. Situations where an update to the latest version is not a problem - and such a choice can be accepted with confidence. Feature updates should become invisible events for users. Reducing the number of releases to one per year or one per three years does not provide such a situation, and never provided. The process itself must change, not the time frame.

Also popular now: