Switching to DVCS, Mercurial
For what purpose and for whom
This article is a summary of the key benefits of DVCS.
I have put together all the arguments about the benefits of switching to DVCS (namely, mercurial) and I will try to make this clear for those who have not had practical experience working with any DVCS.
We produce software products, services, anything. And the product of our activity is the written code, which goes through a number of certain stages in the process of delivery to end users: they made a commit, everything, put it on the conveyor, it went on. Organization of work processes using version control systems may reflect the stages of the development process. Why? You can consider the version control system as a kind of “production pipeline”, in fact, it exactly acts as a pipeline: the code that should be tested first gets into the testing branch, here the code is rolled in, if everything goes fine, if it doesn’t pass the test, it falls off .
Why switch to distributed systems? The joys of transition
Local work
The first reason that is indicated in all sources is the ability to work locally.
Another first reason is speed. It may seem ridiculous: “Well, what kind of speed do you forgive me for when committing or updating or other operations that you perform.” This is so, in fact, not so often in centralized control systems of revisions you commit and not so often update well or perform other manipulations with the remote repository. But here the stick is in 3 ends. First, DVCS push you to work much more closely with the repository. Secondly, the difference in speed is striking. There is no need to think at all about performing almost any operations and the speed is not measured at all in any way, that is, the speed is instantaneous.
Local commits
Local commits spawn frequent commits. How is that good? A deeper story, more granular changes and easier to understand what you did or others did. The local repository is exclusively yours. Until you synchronize with the central repository, your repository is not tied to the remote one much. It allows you to do different things with your repository, in the most urgent case, if you have done something, delete it and take it from the center again, this is of course an extreme case that arises when you don’t really know what you are doing. Local rollbacks, when you need to return to the previous revision, you can do this much less painfully than in SVN. Local work has another big plus. In centralized version control systems (SVN refers to them) committing for frequent is a responsible matter, the action is very meaningful for which much is being prepared. People are afraid to make commits and make them few.
Trust hierarchies
Ability to build a hierarchy of trust. They are important in the case of developing a corporate project, especially in the case of developing a project with a large number of participants and with different quality. Let's say if there is an Open Source project where a person appears on the side whom you don’t know wants to help you, and can really help with something. Trust hierarchies allow you to create relationships between multiple local repositories. You can quite simply and predictably take only from those whom you want and allow you to push only to those whom you really trust. The push process is a new terminology for DVCS. In the case of SVN, the commit happens to the main repository and is done out of your control. In DVCS, if someone has launched into your local repository, you can take this update, consider it from all sides,
On the one hand, these hierarchical relationships are certainly good, on the other hand they complicate the whole concept and complicate your life. If the hierarchy is not thought out to the end and not built as it should, you may encounter a situation where in order to implement what you want, you will need to carry out a chain of operations. Well, if for example you have not really decided who can do it, but technically it is “whom” and “what” is limited, you may have to push it to one place first, then update from that place and push it to another. Agree, the occupation is not healthy. Well, there is no doubt the use of a point-to-point system, that is, a point to the center, when you are a secondary point and the center is the main one unique and you can only commit and update it, it is at first glance more sane more simple.
Project Repository Model
Forcing the “project to repository” model is another plus that is hard to overestimate. SVN has such a common practice: you have one project, it has brunches, tags and a trunk. That's where you all commit. In practice, this happens to say the least not always. SVN as a system that can cover a tree of any complexity and work approximately the same, some will say well others will say equally badly, with any complexity of directory systems, and so it provokes you to have a repository for some area of activity. That is, several areas are created that do not intersect with each other. This is another problem, I don’t know how it was solved in modern svn, but for a long time there was no way it was possible for them to link projects in different repositories in order to do something in common. Well, roughly speaking, from a project that is controlled by one repository that is present there, commit to another. Many have a very rollicking system and almost all of your development, all your projects are in one place and are physically stored on one server and in one repository.
In any DVCS, this will not work (one repository per project). There is practically no way out. If you want to or not, you have to organize one project per repository. This has a number of advantages, a number of its disadvantages, but the advantages greatly outweigh. Conceptually and essentially, this is felt as the right decision. Purely from a technical and safe point of view, this also seems to be the right decision. you can configure, for example, one of the practical cases, the rights for only one repository to one group of developers and the rights to push to another repository for a completely different group and they may not overlap. That is, one can only read from here, the other from there you can only read and write here. It is clear what is meant - the separation of rights by project.
In the case of deployments, deployments that provide source code are in many ways safer. When deployment occurs in the most primitive way, what is done in svn by svnup or svncheckout is safer here. It’s difficult, even impossible for you to take more than you want to take to a battle server or to some “dangerous” server. Take only one repository that you deploy, and we remember that the repository is synonymous with the word "project", that is, in the worst case, your project will suffer. There are a number of pluses, first of all, the lack of nested .svn subdirectories, which although svn does not do this anymore, nevertheless, many have versions up to 1.6.
There is one directory, a high-level .hg directory, and these service subdirectories do not appear in all places where it is necessary and not necessary, and the risk that an evil hand enters them is reduced due to the fact that these directories are in one place and there is only one place to protect not in every subdirectory.
Since the projects themselves are separate folders, you can easily copy them from place to place and transfer them on your file system. This is a known problem in svn when you first check out a high-level project and then manipulate inside under the projects. This causes a number of problems. The idea of moving directories with your hands inside the hierarchy controlled by svn is a very bad idea and can lead to a lot of headaches and will require a specialist who knows very well how everything works in svn to resolve the situation later.
Audit Delivery Reliability
Another advantage of such systems is called “signing” or “subscribing revisions by electronic signature," roughly speaking, a hash is considered to be one or another reliability which is then used in all operations as a means of monitoring data integrity. It's not about integrity check, not about privacy and defense against attack. What is important and what is really practical, the system guarantees the delivery of undamaged data.
Sane model of branches. Merge, branch (merge, branches)
We came to the main advantage, to the very thing that pushes people to switch to DVCS.
That's the problem. The problem is that in systems with central servers (svn) there is complexity and the complexity of branching and support of these branches, which is completely recognized by both developers and users. Support means if you made a branch, then you need to take from these branches, put them in them, you need to merge, pour, produce everything that you can imagine in a rollicking tree. And now the mechanism is very far from ideal, it requires you to have a strong understanding of what you are doing, why you are doing it, maintaining a sequence of actions and often very mystical cases occur. We can assume that this is the fault of the programmer who allowed mysticism, but the Internet is full of screams about how people try to blink and how they get it strange. In practice, at the moment, the number of people who are actively working with branches in SVN, not great. And those renegades who sit on svn and actively work with branches, they avoid merge as they can.
Merge, branch in DVCS - simplicity is absolutely uncommon. And it works, well, it just works. What works in subversion is hardly a working configuration. And this is not due to the fact that some developers are smarter, others are dumber, some have provided for others not. The fact is that, conceptually, a distributed system is strongly tied to branching and then to the merging of these changes. Well, a simple example, if both of you and your colleague take a certain copy of the repository from some place and work with it independently, commit there constantly and make your own ingenious changes, when you try to upload this back, you will encounter a model that is consistent in the simplest case, when one revision replaces another, it will not look so consistent in the place where you merged. This is especially noticeable if you push from time to time, from time to time you take revisions from there, in short, the multi-headed situation here is not something special, some kind of problem is a completely working situation. That is, I want to say that in distributed systems, if there is no normal merging and normal branching, all this simply will not work. Switching between branches is made extremely simple.
Ssh repository access
It can be complete or controlled, it is up to the administrator to decide. You can directly access ssh to the remote repository to which you need to access, and depending on the rights, you can push there, you can bullet from there.
Easy to deploy integration and central repositories and replicators
There is no particular problem raising a remote repository that will accept your pushes or give you pools - no.
Theoretically the best scalability
Some praise DVCS for better scalability. That is, they can be easier to expand when for some reason one server can’t cope anymore, it’s easy to connect 2, 5 and 10. It’s some sly advantage, because in practice I didn’t notice such bottlenecks that the server was a plug. Here, the server requirement for what is “central” (because it is as central as yours, everyone just has access to it) is a repository, the requirements for access to it, and performance are much more sparing. They communicate with him not at all often.
What do you lose when switching to DVCS
A drop of tar. Loss of simplicity model
The very first thing you lose, and of course the serious loss, is the loss of the simplicity of the model. If everything is simple in svn, there is a place where you commit, there is a place where you update its state is understandable and more or less predictable, now the model has completely changed. You have some kind of independent repository. Even advanced specialists who understand everything theoretically seem to stick into this sophisticated model with their foreheads, for example, the idea that your repository is completely different from mine and my version number has absolutely nothing to do with your version number does not easily fit into your head.
Conditional minus - you need to read the docks
Mercurial is rather complicated for talking on fingers. It is relatively simple, it is simpler than many other distributed systems, but nevertheless it is necessary to read at least a minimum of documentation. It is necessary to do this. You can’t just sit down and start working with him fully. For full understanding, you need to read the documentation.
Truly large repositories can be a problem. Just because a large repository with the whole story can be downloaded for a long time when cloning on a bad channel. It’s good that this is done only once.
Ways to do branch
The crown path is to make clones. The thing is that if you have a repository in place And then you can make a clone in place B, work with it, commit and at the time of cloning these 2 copies are absolutely unrelated, you can work on different copies in parallel. This is an example of the most primitive stench. Merging then is also not at all difficult, because you can make pull repository B from repository A and then make Merge the standard way. The main advantage of this path, it is difficult to confuse something here. You have 2 different places, they are independent and in order to switch from one place to another (from one brandy to another brandy) you need to do quite conscious actions. It’s easy to actually get rid of such a strum because you don’t need to do anything for it, you can either just forget about it, or just delete it at the file system level and that’s all, and there’s no strumming. After you have worked, naturally everything has been stained, it can be removed, and there will be no trace of this very brand anywhere.
The disadvantages of this method also exist, and some due to these shortcomings practically do not use such a method. First, a stand-alone copy of the repository from the point of view of project management is sometimes a troublesome thing. Perhaps what you are developing the program doesn’t really like when your projects jump from place to place, when you switch to some test or development version or an unstable version and this requires a change in the path to the project, well in some situations it’s not very conveniently. The second problem is that if you need to make this repository accessible, you need to push 2 times. That is, you need to push your main repository and push this one. They are so independent that this already creates certain problems, and the support of these 2 versions requires double movement, I mean 2 brunches.
The second way of (also conditional) munching is bookmark munching, that is, you can call any revision somehow and start working with this revision. In principle, if you understand the concept of multi-headedness, all of these methods are about the same thing - "You lead different heads." In the case of bookmarking, you simply usefully call one of the goals, for example, branch A, and work with it. Strictly speaking, you don’t have to call it something at all if you are able to remember its identifier number, you can work with it anonymously and this is a very similar way of another strumming when you don’t call your head just upgrade (here, “update” means “switch to head ”) until a certain revision to a certain head and work from this place.
Now about the virtues of strumming bookmark-s. Firstly, it is a very fast and very understandable way. This is a much more understandable way than making clones, although for some reason clones are considered the default path. All branches are located in one place, that is, you do not make physical copies (clones).
Named starchs mentioned above, practitioners say this is the most correct way, what is called hg branch - this way they think is logical, understandable and quite safe. Unlike clones, you don’t need to run 2 parallel development trees and somehow accompany these 2 trees , you don’t need to reflect these changes with your environment, that is, your project is in one place, just from time to time you switch it to either one branch or another branch. The switch is elementary, done using the standard update command with the name of the brand.
Naturally, in this method, as in the others, switching is done very quickly from one brand to another, they are global and they are eternal, that is, brand information is part of the standard mercurial meta-date (meta-information), and brand information there always will be (unlike git). If you push your repository, this information also fuzzes with any clones, with any portability, it is transferred and in my opinion it is the right way, and it’s hard to call it a special merit, this is how it should work.
Of the drawbacks of this method - they are considered 2. Some say this is a complex method, not very clear (it is difficult to agree with this). The second drawback is that it slightly spoils this concept of warnings and self-monitoring from the point of view of excessive multi-headedness. When you try to suck in something that brings a new head, in the event that this new head is part of your main brand, in this case there is probably something to think about. Of course it depends on your logic, on your agreements. But when you create a new brand and the first push will tell you that it would be nice for you to do "-f" because a new head is being created. In this case, you were exactly going to create a new head, and such a message may surprise and alarm us a little, “why is it swearing like I’m doing everything right.”
Perhaps the shortcomings are over. And from the point of view of merging (merging) of these starchs, it is almost the same in all cases. That is, you need to somehow deliver the information, well, if this is a clone you do pull, in other cases you do a merge indicating with which head or with what strum you merge it and that's all. From now on, everything is the same for all branching and merging methods. I hope that I somehow described cases when this transition makes sense. People who switched to DVCS believe that this transition makes sense and the meaning was huge and do not regret for a minute that they left svn and got acquainted with the new and beautiful world of DVCS.
In addition
Rebase
The problem with rebase. Why is that bad? Information is lost. This is such a thing that immediately at first glance with not obvious results and it is not very clear why to do this. To avoid a large number of commits from the series “I almost made feature 1”, “I almost made feature 1 85%”, etc. This feature allows you to make 1 commit to 1 feature and run it. The danger of rebase is not that we combine several of our commits into one, this is not a rebase, but a basic operation. The danger of rebase is that we made 10 commits ourselves and we took an update from the repository and made 10 commits of Petya there. Now, if we honestly marry, then we see that: here we have the repository branch point, Petina changes, your changes and now at some point they converged, we got one source. If we do rebase we do the following. Accordingly, we make the version control system forget that we made our changes in parallel with Petya and transplant them on top of Petya's changes, and then commit the result to the version control system. At the same time, this is the process of “transplantation”, he makes us some “intermediate” commit which makes some changes to the code in order to synchronize the state after Petya and before us. In this case, accordingly, the story is monstrously lost, that is, when we look at such a “flattened” commit in the future, we cannot determine that this was done in parallel with Petya and what, in fact, was the reason for such a merge. At the same time, this is the process of “transplantation”, he makes us some “intermediate” commit which makes some changes to the code in order to synchronize the state after Petya and before us. In this case, accordingly, the story is monstrously lost, that is, when we look at such a “flattened” commit in the future, we cannot determine that this was done in parallel with Petya and what, in fact, was the reason for such a merge. At the same time, this is the process of “transplantation”, he makes us some “intermediate” commit which makes some changes to the code in order to synchronize the state after Petya and before us. In this case, accordingly, the story is monstrously lost, that is, when we look at such a “flattened” commit in the future, we cannot determine that this was done in parallel with Petya and what, in fact, was the reason for such a merge.
How can I lose history when rebase is done? There is such a chance of losing the story. So you split rebase in a specific branch on a specific revision, and another developer after this moment produced branch (buddied) from your branch, then it does not obviously become. This is the case when both developers work with a public branch, the history cannot be rewritten for it. In order to avoid such confusion for public branches do not rebase, rebase only on their private branches.
Functional shell
A functional shell is a valuable thing. Suppose we do some feature, write code ... write ... write. Suddenly we see a bug in the code, we have already made some changes in this code. Further, we have several options, we can fix the bug and commit everything together, it's all bad. We can save our changes somewhere, fix the bug with a separate commit, and then restore the changes, etc. Well, or other crooked ways. Merkurilovsky extention shelve does the following. We wrote a feature that suddenly found a bug, clicked the shelve button and hid our changes, our working copy looks like it was before the changes were made. And we ruled the bug, committed it, then click unshelve and our changes come back to us, we see our changes but with the bug fixed. We finish our changes and then commit them.
Repository managers
There are repository managers for mercurial. They allow you to search for a project file, search by comment or by content:
rhodecode - there is a demo on the site, you can see it in action.
phpHgAdmin
built-in light-weight web server
hGate
All solutions are free.
PS Using the above material, Umputun's podcasts and Arthur Orlov's report
Update
corrected an article saying that the method of bookmarking is deprived of portability.
In fact, bookmarks are automatically synchronized between repositories.
It was also worth mentioning about mercurial-server. The name is a bit confused, it is not a Mercurial server.
mercurial-server provides an improved management interface for the shared ssh mechanism that is provided in hg-ssh.