JuniorIL June 3, 2013 at 12:35

FeatureBranch

Transfer

With the proliferation of distributed version control systems (DVCS) such as Git and Mercurial, I am increasingly seeing discussions about the proper use of branching (merge) and merging (merge), and how this fits into the idea of continuous integration (CI). There is some confusion about this issue, especially when it comes to feature branching and its relevance to CI ideas.

Simple (isolated) Feature Branch

The main idea of feature branch is to create a new branch when you start working on some kind of functionality. In DVCS, you do this in your own repository, but the same principles work in centralized VCS.

I will illustrate my thoughts with the following series of diagrams. In them, the main development line (trunk) is marked in blue, and two developers marked in green and purple (Reverend Green and Professor Plum).

I use the designated colored rectangles as designations for local commits in the brunch. The arrows between the branches indicate the mergers, the orange rectangles highlight the merges themselves. In this example, there are updates in the main line, say a couple of fixed bugs. When this happens, our developers merge them into their local branches. In order to get a sense of time, let's assume that we are talking about a few days of work, when each developer commits his changes about once a day.

To make sure the code works, they can run builds and tests on their branches. In the framework of this article, we assume that along with each commit and merge, automatic builds and tests for the branch in which it was made run.

The main advantage of feature branching is that each developer can work on his task and be isolated from what is happening around. They can merge changes from the main line at their own pace and be sure that this will not interfere with the functionality being developed. Moreover, this allows the team to choose which of the new developments to release and what to leave for later. If Reverend Green is late, we can only provide a version with Professor Plum changes. Or, on the contrary, we can postpone the professor’s additions, perhaps because we are not sure that they work the way we want. In this case, we simply ask the professor not to merge his changes into the main line, until we are ready to release its functionality.

Despite the attractiveness of this image, certain problems may lurk in this approach.

Although developers can work on their functionality in isolation, at some point, the result of their work must be integrated. In our example, Professor Plum easily updates the main line with its changes, there is no merging, because he already received all the changes in the main line in his branch (and passed the build). However, not everything is so simple for Reverend Green, he has to merge all his changes (G1-6) with the changes of Professor Plum (P1-5).

(In this example, many DVCS users may feel that I am missing many details in such a simple, even simplified explanation of feature branching. I will explain a more complex scheme later.)

I made this merge rectangle huge because it is a dangerous merge. It can go without problems, it is likely that the developers worked on different parts of the code without interactions, and then the merge will go smoothly. But they could also work on parts that interact, and then hell awaits them.

Nightmares can take many forms, and development tools can save some . The most standard ones can be in the difficulties of merging the source code, when two developers are working on the same file with the same theme. Modern DVCS cope well with such problems, sometimes it even seems that it is not without the help of magic. Git has a reputation as a tool that can handle complex conflicts well. So good that we even leave this question outside the scope of this article.

The problem that bothers us more is semantic conflicts. The simplest example might be the case in which Professor Plum changes the name of the method that Reverend Green calls in its code. Refactoring tools will help you rename the method without problems, but only in your code. Therefore, if G1-6 contains a new code that calls foo, Professor Plum will not know about it, because this change is not in its branch. Awareness of where the dog is buried will come to us only in a large merge.

Renaming a function is the most obvious example of semantic conflict. In practice, they can be much more secretive. Tests are the key to solving them, but the more code you need to merge, the more chances there are for conflicts and the more difficult it is to fix them. The risk of conflicts in general and semantic in particular makes large mergers scary.

A consequence of the fear of big merges is the reluctance to refactor. Keeping your code clean requires constant effort, and in order to succeed, everyone must clean up the garbage when they see it. However, such refactoring in feature branch is problematic insofar as it makes Big Scary Merge even bigger and worse. As a result, developers are afraid of refactoring like fire and the code is overgrown with freaks.

In the above problem, I see the main reason why feature branching is a bad idea. At that moment when the team is afraid of refactoring to maintain a healthy code - they are in a long peak with no chance of an elegant exit.

Continuous integration

It is these problems that continuous integration must solve. With CI, my chart will look like this.

There are many more merges here, but merging is one of those things that is better done little by little often than rarely and in tons. As a result, if Professor Plum changes the part of the code on which Reverend Green depends, our green colleague will figure it out much earlier, in the P1-2 merge. At the moment, he needs to change G1-2 to work with these changes, instead of G1-6 (as it was in the previous example).

CI is effective in neutralizing big merge problems, but it is also a critical communication mechanism. In this scenario, a potential conflict will appear when Professor Plum merges G1 and realizes that Reverend Green uses the professor's libraries. Then Professor Plum can find Reverend Green and together they can discuss the interaction of their functionality. Perhaps Professor Pum's functionality requires some changes that do not get along with Reverend Green's functionality. Together, they can make much better design decisions that will not interfere with their work. With isolated brunches, our developers will not know about the problem until the last moment, when it is often too late to resolve the conflict painlessly.

It is important to mention that in most cases feature branching has a different approach to CI. One of the principles of CI is that everyone commits to the main line every day, so if the feature branch lives more than one day, it turns it into something very far from CI. I heard people say that they use CI because their builds run on the CI server, on each branch and on every commit. This is a continuous build, and it’s good, but there is no integration , therefore it is not CI.

Promiscuous integration

I said in parentheses earlier that there are other ways to feature branching. Let's say Professor Plum and Reverend Green at the beginning of the iteration brew fragrant green tea together and discuss their tasks. They find that there are interacting parts among the tasks and decide to integrate between each other like this:

With this approach, they merge with the main line at the end, as in the first example, but they also make merges with each other to avoid the Big Scary Marge. The idea is that the main advantage of feature branching is isolation. When you isolate isolate your brunch, there is a risk of vile conflict escalating outside of your knowledge. Then isolation is an illusion that will break painfully sooner or later.

However, is this more labor-intensive integration a form of CI or is it a completely different beast? I think they are different, again, the key feature of CI is that everyone integrates with the main line every day. Integration among feature branches, which I call you, promiscuous integration (PI), does not include or even need a main line. I think this difference is very important.

I see CI mainly as a means to give birth to a release candidate on every commit. The task of the CI system and the deployment process is to refute the readiness for production of the current release candidate. This model needs some kind of basic development line that represents the current state of the full picture.

- Dave Farley

Random integration vs continuous integration

And yet, if PI is different from CI, then in which cases is PI better than CI?

With CI, you lose the ability to use a version control system for a selective approach to change. Each developer influences the main line, so all the functionality grows in it. With CI, the main line should always be healthy, and in theory (and often in practice) you can release after each commit. Having a semi-finished functionality, or functionality that you prefer not to release, you will not damage the functionality of the entire system, but it will require some kind of disguise to hide it from the user interface, such as not including a new item in the menu.

In such cases, PI can provide something in between. This allows Reverend Green to choose when to accept Professor Plum changes. If Professor Plum makes any changes to the kernel API in P2, Reverend Green can import P1-2 but leave the rest until Professor Plum finishes its work and merges into the main branch.

However, in general, I do not think that fetching functionality for release using VCS is a good idea.

Feature branching is a modular architecture for the poor, instead of building a system with the ability to easily replace functionality with time / deployment, people bind themselves to source control for this mechanism through a manual merge.

- Dan Bodart

I prefer to design software so that you can turn functionality on and off with a configuration change. There are two useful techniques for this, FeatureToggles and BranchByAbstraction . They require you to think more about what and how to divide into modules and how to control these options, but we came to the conclusion that the result is much more accurate than the one that comes out if you rely on VCS.

What bothers me most about PI is its exposure to team communication abilities. With CI, the main line serves as a communication point. Even if Professor Plum and Reverend Green never spoke, they will find the conflict on the day of its formation. With PI, they will have to notice that they are working on interacting code. The constantly updated main line contributes to the assurance of everyone that he is integrating with everyone, there is no need to find out who is doing what, respectively, and there are less chances for changes that remain hidden until late integration.

PI arose from open source and, presumably, the less intense pace of open source project may be a factor for it. In full time work, you work many hours a day on a project. This allows you to work on functionality with priorities. With open source people often donate an hour here and a couple of days there. Functionality can take a lot of time for one developer to execute, while others, with a lot of free time, will be able to bring their changes to an acceptable quality earlier. In such a situation, a selective approach may be more important.

It is important to recognize that the tools you use do not depend on the strategy you choose. Although many associate DVCS with feature branching, they can be used with CI. All you have to do is mark one of the branches as the main line. If everyone does pull and push to this branch every day, then you have the most basic line. In fact, in a well-disciplined team, I would rather use DVCS for a CI project than a centralized VCS. With a less disciplined team, I will worry that using DVCS will push people to long-lived branches, at a time when centralized VCS and the complexity of branding will push them to frequent commits to the main line.

PS From the translatorI was prompted to study questions about approaches to using VCS by this article , thanks to which I began to search for more detailed descriptions of the “correct” use of branching and came across the above translated text. Although I do not pretend to be a quality translator, I just want to get into the feed to the developers and give them a reason to think about the opposite approach accepted in open source (forking). Do not hit with sticks, but criticize constructively, this is my first time doing it :-) .

Tags:

FeatureBranch

Simple (isolated) Feature Branch

Continuous integration

Promiscuous integration

Random integration vs continuous integration

Also popular now: