How Microsoft rewrote the C # compiler in C # and made it open

Original author: Mads Torgersen
  • Transfer
The author - Mads Torgersen, a leading architect of C # in Microsoft

Project Roslyn

Roslyn - is the code name that was assigned to the open-source compiler for C # and Visual Basic.NET. The project began in the deepest darkness of the last decade of Microsoft's corporate life - and ended as an open source project, a cross-platform, public universal C # engine (and VB, which I will take for granted in the rest of the article).

The first talk about the project, which would later become known as Roslyn, was already going on when I came to work at Microsoft in 2005 - shortly before the release of .NET 2.0. There was talk about rewriting C # to C #. This is a normal practice for programming languages ​​- proof of the maturity of a language. But there was a more practical and important motivation: we, the creators of C #, did not program in C #, we programmed in C ++! If you program on C # every day, then you change your opinion: the great power of working on the tool you are developing (dogfooding).

Users depend on the behavior of the new compiler exactly like the old one. Writing a new compiler for C # is an attempt to find a bug-to-bug match.

The difficulty of rewriting the compiler, which has been in active use for several years, is that users depend on the behavior of the new compiler exactly as they did on the old one. Writing a new compiler for C # is an attempt to find a bug-to-bug match. And I'm talking not only about known bugs, but also about unknown errors and undesirable forms of behavior that developers have found and use, often unconsciously.

For many years, the scale of this problem did not allow us to even begin to implement the project.

And although the developers of the language group at Microsoft received many benefits from the new C # compiler written in C #, however, the value for the end users was not so obvious: how would the new compiler be useful to them? Perhaps the only people who care that the C # compiler is written in C # are the compiler developers themselves.

At the same time, another problem manifested itself more and more: duplication of effort between various tools running on C # code. In addition to the compiler, another team worked on IDE support for C # in Visual Studio, and they also had to write a bunch of code (at the time, also in C ++) to understand the syntax and semantics of C #.

At the same time, the number of tools from Microsoft and other companies, such as StyleCop, CodeRush, etc, grew: they must all implement meaningful processing of C # code. Each of these programs has its own slightly different errors, different levels of understanding, different compromises and concessions. And they all would have spent a lot of effort to come to a common understanding of the code.

And we decided on an important proposal: to make sure that there is only one code base in the world - a single base for all the tools that work with C # code!

The value of such a proposal arises from the increase in the number of available tools, and especially from the improvement in the quality of existing tools. All requirements for the correctness and performance of the language are imposed on a single code base. One-time effort is enough to make a stellar quality base and tremendous versatility. We will create a real language engine! Unified, open API for C # code. We will give a new definition to the concept of "compiler".

Of course, as soon as you create an API for the wide C # community, it goes without saying that it should be a .NET API implemented in C #. So, the old dream of writing C # in C # is almost like a random side effect.

Thus, Roslyn was born out of the openness mentality: sharing the inner workings of C # for software use by the whole world. This in itself was a bit of a bold suggestion for the still-rather-closed Microsoft corporate culture.

Will we share intellectual property for free? Will we empower tools that compete with us?

In a corporate discussion, we were helped to win arguments for strengthening the ecosystem and creating a language with the best tools on the planet. It was about the long-term growth of C # and .NET compared to short-term monetization and protection of Microsoft assets. Thus, without even mentioning open source, the Roslyn bet was a big and bold step for Microsoft.

Of course, developing something like this cannot be easy. Roslyn's perspectives were very ambitious and fraught with technical problems, and it took us half a decade to deal with everything. But that's another story.

For most of the initial development, Roslyn remained a closed source project.

From the very beginning of serious work on the project in 2009, we had ideas to make compilers open, but Microsoft was simply not ready yet.

Since the 1970s, Microsoft has had a closed-source culture and the protection of source code by patents. And although the changes were in the air, they were slower than our team hoped.

In fact, for some time it seemed that the company was going in a completely opposite direction.

The Windows 8 project has greatly affected the entire company. Thanks to the new programming model, its tentacles penetrated deep into the teams of developers of tools and languages, and everything was covered with utmost secrecy, not only outside, but even inside the company. As an example, the async function that we were developing at the time was coordinated and mixed with the Windows 8 programming model, and I would not dare to publish notes about its design even within the company, for fear of accidentally leaking information about Windows 8 and problems on my head ! This created a terrible climate for innovation, and of course, did not allow us to hope for the open source code of the C # compiler.

However, in the end, when Windows 8 went its way, the company began to transform and found a new direction, a new leadership and a completely different philosophy - the Microsoft that we know today. Open source is now rapidly spreading inside Microsoft.

F # was released in 2010 with an open license and its own organization - F # Software Foundation . An outstanding community formed around it, which soon became the envy of all of us. Our team insisted on getting a free license for Roslyn - and, finally, the corporate infrastructure allowed it.

By 2012, Microsoft created the Microsoft Open Tech organization, specifically focused on open source projects. Roslyn went under her wing and officially became an open source project. Roslyn was quite ripe for this: all development resources were internal and well known, and the project itself did not suffer from a large number of dependencies that could have caused licensing conflicts.

In April 2014, at the Build developers conference in San Francisco, Anders Hejlsberg presented Roslyn as an open source project , and the source files were published on April 3 on CodePlex (the former Microsoft platform for repositories) under the Apache 2.0 license.

At the same time, the .NET Foundation was declared the base for .NET projects, including Roslyn.

This release has become a real breath of fresh air! We began to reap the benefits of openness in CodePlex, and then the remaining procedural obstacles to open source at Microsoft were eliminated, so today open source is a natural and integral part of how we work in many of our teams.

We no longer consider GitHub as a place to publish source codes - it’s just our place of work.

On other fronts, the company also realized that it was not necessary to strive to control everything. It became clear that there were no compelling reasons for the existence of CodePlex - and Roslyn, along with other projects, migrated to GitHub, by which time the de facto main platform for open source projects. Not only the code itself, but also the process of its creation is conducted in GitHub: we no longer consider GitHub as a place for publishing source codes - it is just a place for our work.

C # language design and compiler implementation are now completely open processes, with significant third-party participation. They create including whole language functions. The value of C # simply rolls over not only due to the scaling up of efforts in writing functions and correcting errors, but also due to the understanding and correction of the course that we produce thanks to an instantaneous daily feedback loop with the community.

It was a long and crazy journey, and for me it symbolizes the tremendous changes that Microsoft has undergone in the last decade. Nugget Roslyn was born in the dark, but grew up with ideas of openness - and today exploded with a million different uses due to the power of open source.

Learn Roslyn and C # Language Design:

Roslyn on GitHub
C # on GitHub

Also popular now: