Software Configuration Management // Distributed Version Control

    Greetings. As promised - the continuation of a series of notes on software configuration management, in a common people called Software Configuration Management. The whole cycle can be found on the link to the CM tag . Just a couple of notes remained from the unreached.

    Today we will talk about a rather controversial and somewhat provocative issue - distributed version control systems. I know that such systems are popular among Habra residents, so I’m ready to discuss it in advance. Moreover, I urge you not to pass by and speak out if you have anything to say on the case.


    So, there is a project and in it - a version control system that serves several teams that implement this project. Version control system - one for all. Let me remind you that I continue a series of notes, and earlier I talked about version control in general, bypassing specific implementations. So the subject area is also gradually evolving from simple to complex.

    So, at some point there is a need to make the central repository available locally in one of the development centers - to speed up work and bypass traffic or bandwidth restrictions. Let's say there are two teams located geographically in different places and time zones - say, the Far East of Russia and the Central Time Zone of the USA, they are separated by half the world. Work is underway on one project, and there is a need to change the same parts of the product. Suppose that the version control system server is located in the USA - accordingly, developers in Russia have to send changes across half the globe to create each new version. And any operation like moving to another branch with taking the entire selected configuration will take too much time, given the size of the ping. Generally,

    Since the problem is not new and relevant, over time, different approaches to solving the problem have been formulated. More precisely, two approaches to the construction of distributed control systems.

    Open distribution is a construction principle in which each working copy of the configuration can have its own set of child versions and the created versions are exchanged at the discretion of the change creator.

    The advantage of such systems is that work on a separate workstation can go independently of other storage instances. Actually, there may not be any storage - how many copies, so many storage. It is not surprising that such systems found application primarily in Open Source. The absence of the need to maintain a separate server makes it possible to exchange only the information that is needed, and not to overload the storage and traffic with that delta that someone might never need.
    The downside of this approach is that the exchange of delta work products is difficult to centrally control. It turns out some Brownian delta movement, which many managers who are accustomed to centralization may not like.
    Examples of such systems are BitKeeper, git, Mercurial (Hg).

    Distribution by replication involves the creation of equal copies of the central data warehouse (or parts thereof) on all distributed servers. Here you can draw an analogy with databases and their replication. For each developer, the repository of versions to which he connects is the main one. All versions and branches are created in a central repository or replica. To distribute data, a copy of the repository is made to other available servers and some developers switch to the copy made. If it is necessary to exchange the results of work, replication of the storage occurs - both servers exchange meta-information.

    The advantage of this approach can be considered the centralization of work within the same team location. It is also worth adding that it is possible to keep part of the accumulated information from synchronization with other teams, but at the same time make it available simultaneously to the entire local team. This is important when the code of the developed subsystems should not fall outside - even for other teams working on the same product.

    The downside is the need to configure replication mechanisms. But, as a rule, systems using this approach provide tools for efficient data exchange. In addition, for some, the fact that all version operations are performed on the same server, and not on the developer's local computer, may be a minus. That is, the "distribution" of the system is manifested at the level of teams and their location, but not at the level of a simple developer.
    Examples of replicated systems are ClearCase and Perforce.

    Both types (open and replication) are similar to each other - in both cases, information is exchanged between different copies of the same set of elements and their versions. The difference between them is in the "scale." In systems with replication, the minimum replica unit, as a rule, is the repository or its significant part, processed as a whole. In open distribution systems, the smallest unit of information exchange is a separate version of a single item.
    Both types of distribution are characterized by a common problem. This is the need to introduce a clear agreement on the naming of elements and their branches, as well as labels to indicate the resulting configurations. When connecting the results of work, different files with the same name and meta-information (branches, tags, attributes) should not be obtained. Therefore, all developers and teams working separately from each other must adhere to common standards. Different systems have mechanisms in place to provide this condition. For example, when working with ClearCase, triggers are created for creating any meta-information that checks it for compliance with the standard - for all created branches, it is necessary to have the code (or identifier) ​​of the site (command) in which the branch was created in the name of the branch.

    In addition, systems with open distribution actually leave to the discretion of each individual developer - what he will give to the team in the form of a delta, and what he will not put on public display. For better or worse, it depends on the culture adopted by the project. For more centralized systems, with replication repositories, this problem is seen from a different angle. When everyone is obliged to make changes to the central (for their team) version control system, the meta-information base grows rapidly in size - which affects both the cost of storage and the speed of replication of databases distributed in space.

    Which of the approaches is better, of course, cannot be said immediately for all projects. To adapt the Brownian movement of git for work or to stop at a more stable state - it is up to the management of each project. There is no single solution for all teams and projects and cannot be. Who cares to look at the differences between different systems and models - see link [1].

    By the way, not only version control systems can be distributed, but also tracking systems for change requests. The logic of work is completely similar. Here are just the main model of work - replication. An example is IBM Rational ClearDDTS. Since such systems are not very common, we will not dwell on them in detail.

    According to tradition, the sources used and recommended for independent study:
    1. en.wikipedia.org/wiki/Comparison_of_revision_control_software - comparison of version control systems;
    2. lib.custis.ru/index.php/The_Risks_of_Distributed_Version_Control - a sober look at the risks associated with distributed version control systems;

    I recommend another article about the problems of implementing different version control systems:
    lib.custis.ru/index.php/Version_Control_and_%E2%80%9Cthe_80%25%E2%80%9D
    A bit provocative, but it is worth reading first of all to those who relate yourself to the progressive part of the programming community. And, answering the question in advance: no, I'm not a fan of either SVN or git.

    That's all for today. Do not pass by, speak out. It is interesting to hear the opinions of people using Perforce.
    Fans of git / Hg / etc - it is interesting to hear about not obvious problems that arise when exchanging a delta (because they should be, nothing happens smoothly and perfectly).
    If anyone did replication of repositories in SVN or even CVS - tell me, you will be grateful.

    Well - to be continued.

    Also popular now: