The similarities and differences between Mercurial and Git

From the sandbox

By the nature of my work, I often witness “holy wars” between fellow programmers on the topic of which version control system to choose for a particular project. The role of the version control system is especially acute in cases of development and support of projects with a long history. There are many options for tools, but I want to concentrate on two, in my opinion, the most promising ones: Mercurial and Git. Next, we will try to consider the capabilities of both systems from the perspective of their internal structure.

A bit of history

The impetus for the creation of both systems, both Mercurial and Git, was the one event of 2005. The thing was that in the aforementioned 2005, the Linux kernel lost the ability to use the BitKeeper version control system for free. After using BitKeeper for three years, kernel developers are accustomed to its distributed workflow. Automated work with patches greatly simplified the process of accounting and merging changes, and the presence of history over a long period of time made it possible to conduct a regression.

Another important part of the Linux kernel development process is the hierarchical organization of developers. At the top of the hierarchy was the Dictator and many Lieutenants in charge of the individual subsystems of the nucleus. Each Lieutenant accepted or rejected individual changes within his subsystem. Linus, in turn, delayed their changes and published them in the official repository of the Linux kernel. Any tool that replaced BitKeeper should have implemented such a process.

The third critical requirement for the future system was the speed of work with a large number of changes and files. The Linux kernel is a very large project that takes thousands of individual changes from thousands of different people.

Among the many tools suitable was not found. Almost simultaneously, Matt Mackall and Linus Torvalds release their version control systems: Mercurial and Git, respectively. Both systems are based on the ideas of the Monotone project, which appeared two years earlier.

Similarity

Both version control systems have a number of common features:

revisions are associated with checksums;
history has the form of a directed acyclic graph;
high-level functions are supported, including bisection, branching and selective fixation.

Differences

Despite the commonality of ideas and high-level functionality, low-level system implementations are significantly different.

History storage

Both Git and Mercurial identify file versions by their checksum. The checksums of individual files are combined in manifests. In Git, manifests are called trees in which some trees can point to others. Manifests are directly related to revisions / commits.

Mercurial uses Revlog's dedicated storage engine to improve performance. Each file placed in the repository is associated with two others: an index and a data file. Data files contain casts and delta casts, which are created only when the number of individual file changes exceeds a certain threshold value. The index serves as a tool for efficient access to the data file. Deltas resulting from changing files under version control are only added to data files. In order to combine edits from different places in a file into one revision, an index is used. Revision of individual files consists of manifests, and of manifests - commits. This method has proven to be very effective in creating, searching, and calculating file differences.

The Git storage model is based on large object binary files (BLOBs). Each new revision of a file is a complete copy of the file, which ensures quick saving of revisions. Copies of files are compressed, but, nevertheless, there are large amounts of duplication. Git developers have applied data packaging techniques to reduce storage requirements. Essentially, they created something similar to Revlog for a specified point in time. The resulting packages are different from Revlog, but they have the same goal - to save data, effectively consuming disk space. Since Git saves file casts rather than increment, commits can be easily created and destroyed. If the analysis requires looking at the difference between two different commits, then in Git the difference (diff) is calculated dynamically.

Branching

Branching is a very important part of configuration management systems, as it allows parallel development of new functionality, while maintaining the stability of the old. Branching support is present in both Git and Mercurial. Differences in the format for storing history are reflected in the implementation of branching. For Mercurial, a branch is a mark that attaches to a commit forever. This mark is global and unique. Any person pulling changes from a remote repository will see all the branches in his repository and all the commits in each of them. For Mercurial, branches are a public place of development outside the main trunk. The names of the branches are published among all participants, therefore, as a name, they usually use time-stable version numbers.

Git branches, in fact, are just pointers to commits. In different clones of the repository, branches with the same name may indicate different commits. Branches in Git can be deleted and transferred separately (each uniquely identified by a local name in the source repository).

Practical aspects of use

The differences in the Git and Mercurial implementations can be illustrated with examples.

Mercurial makes it easy to capture changes, push and pull them with support for the entire previous story. Git does not care about supporting the entire previous story, it only captures the changes and creates pointers to them. For Git, the previous history does not matter, and what pointers used to refer to, what matters is what is relevant at the moment. There is even a tool that guarantees the safety of local history when pulling changes from external storage - fast-forward merge. If this mechanism is enabled, then Git will report changes that cannot be resolved without moving forward. These errors can be ignored if changes are expected.

When performing a commit rollback or pullback with a merge, Git simply changes the branch pointer to the previous commit. In fact, at any point in time when it is necessary to roll back to some previous state, Git looks in the log for the corresponding checksum and tells which fix corresponds to it. As soon as something is fixed in Git, you can always return to this state. For Mercurial, there are cases where it is impossible to completely return to its original state. Because Since Mercurial creates a fix to solve a problem, in some cases it is difficult to move backward with the latest changes.

Extensions exist to solve various problems in Mercurial. Each extension solves its problems well if it exists on its own. There are even some extensions that provide similar functionality, but in different ways.

For example, consider working with deferred history. Suppose we need to write changes from a working copy without committing to the repository. Git suggests using stash. Stash is a commit or branch that is not saved in the usual place. Stash is not shown when a list of branches is displayed, but with all tools it is treated as a branch. If similar functionality is required by Mercurial, then attic or shelve extensions can be used. Both of these extensions store “pending” history as files in the repository, which can be committed if necessary. Each extension solves the problem a little differently, so there is an inconsistency of formats.

Another example is the git commit --amend command. If you need to change the latest commit, for example, add something forgotten or change the comment, then the git commit --amend command will create a completely new set of file objects, trees and commit objects. After that, the branch pointer is updated. If you later need to roll back the changes, you only need to return the pointer to the previous commit with the git reset --hard HEAD @ {1} command. To repeat this in Mercurial, you need to roll back the commit, then create a new one, then import the contents of the last commit using the queue extension, complement it and make a new commit.

It should be noted that none of the above additions uses the capabilities of the Mercurial storage format, and thus they exist solely as an independent add-on on it.

findings

In the last section of this article I would like to express my own opinion on the choice of a version control system. Both Mercurial and Git are good in their segments.

For example, for the purposes of running a commercial software project, I am more impressed with Mercurial.

Strict work with history in Mercurial guarantees the ability to record and search for the original source of the error.
After merging with a branch in Git, we run the risk of getting a gig patch in which an error will be hidden somewhere.
Global branches also provide the ability to control the work of colleagues with regular synchronization with the central repository.

For storing binary files, such as an electronic library, Git is better suited. Compared to Mercurial, it is not focused on calculating file deltas, which is not very effective for binary content. Files themselves rarely change, and the main operations with them are moving and adding. According to my own observations, the Git repository folder with the history of my library is comparable in size to a working copy with a neighborhood of about 10%.

Sources of knowledge

Tags: