External dependencies in a git: submodule or subtree?
Once upon a time, I learned that dependencies should be stored with the project code. Then, when you return to the old version of the code, it is much easier to restore the environment.
My project has several dependencies. Most dependencies live in git repositories. The project itself also lives in the gita.
One of the libraries we use is often updated. We are sitting on the development version, and often we ourselves attribute to it the code that our project requires. That is, you need to quickly skip our edits through the main repository of this library - you do not want to create and maintain your fork for a number of reasons.
Previously, I just copied the dependencies into the project folder, and added the VERSION.TXT file with its version to each file. But, if you need to work with the current version of third-party code, this is inconvenient. Yes, and copying files with your hands when there is a git is somehow stupid. I would like to find a more modern solution.
The most advertised and fashionable feature of the git for working with third-party repositories is git submodules (“submodules”). Naturally, first of all, I began to look at her.
Before that, I had already tried working with submodules in "home" projects. As long as one person carefully uses the repository, there are no special problems. In any case, it is noticeably more convenient than copying the code by hand.
However, as soon as I tried to transfer my experience to a more serious workflow, it turned out that not everything was so simple.
Here is what we are faced with:
Submodules had to be abandoned.
But there is no silver lining. Knowledgeable people suggested that for a long time (even before 1.5.2) there was an alternative solution in the git - subtree merge strategy ("subtrees").
The idea is to take the commit history from an external project, and redirect it to a subdirectory of the internal one. In this case, a standard git mechanism for working with external branches is used.
An example from the documentation: we add the code from the master branch of the Bproject repository (lies in / path / to / B) to our project in the dir-B / subdirectory.
You need to pay attention to the -f switch on git remote add. He tells the git to immediately fetch this remote.
Further, changes to Bproject are pulled by the git pull command with an explicit indication of the desired branch and merge strategy:
If the corresponding remote subtree is not added to the working copy, the name of the branch from which the changes are pulled is not shown in the history of the main repository.
The problem is purely cosmetic, and does not affect the work. It is treated by adding this remote to the working copy:
In the future, if new branches appear, you can pull up the changes in remote:
There are a couple more drawbacks:
Working with subtrees is much more convenient than with submodules. You don’t need to retrain users for it, it’s easier to automate it. Subtrees are easier to maintain. Recommend.
By the way, there is a project at Github aimed at developing work with subtrees: git-subtree .
Additional reading:
My project has several dependencies. Most dependencies live in git repositories. The project itself also lives in the gita.
One of the libraries we use is often updated. We are sitting on the development version, and often we ourselves attribute to it the code that our project requires. That is, you need to quickly skip our edits through the main repository of this library - you do not want to create and maintain your fork for a number of reasons.
Previously, I just copied the dependencies into the project folder, and added the VERSION.TXT file with its version to each file. But, if you need to work with the current version of third-party code, this is inconvenient. Yes, and copying files with your hands when there is a git is somehow stupid. I would like to find a more modern solution.
The most advertised and fashionable feature of the git for working with third-party repositories is git submodules (“submodules”). Naturally, first of all, I began to look at her.
Before that, I had already tried working with submodules in "home" projects. As long as one person carefully uses the repository, there are no special problems. In any case, it is noticeably more convenient than copying the code by hand.
However, as soon as I tried to transfer my experience to a more serious workflow, it turned out that not everything was so simple.
Here is what we are faced with:
- Each person working with the main repository must have access to the repository from which the submodule is taken.
- In the general case, it is not possible to obtain a complete working copy with a single command. Now after git checkout you need to do git submodule update --init.
- The same applies to some other commands of the geet. For example, git archive ignores submodules - you can no longer pack an entire project into an archive with one command.
- From the top-level project, no changes are visible inside the submodules and vice versa. To find out the full state of a working copy of a project, you need to request it for each submodule and for the parent project separately. Without submodules, it’s enough to say git status anywhere inside the working copy.
- After replacing the root directory of the submodule with something else (for example, another submodule), you must manually delete the old version in all working copies.
- The git submodule command does not understand the standard options --git-dir and --work-tree. It can only be run from the root of the working copy. This makes automation difficult.
Submodules had to be abandoned.
But there is no silver lining. Knowledgeable people suggested that for a long time (even before 1.5.2) there was an alternative solution in the git - subtree merge strategy ("subtrees").
The idea is to take the commit history from an external project, and redirect it to a subdirectory of the internal one. In this case, a standard git mechanism for working with external branches is used.
An example from the documentation: we add the code from the master branch of the Bproject repository (lies in / path / to / B) to our project in the dir-B / subdirectory.
$ git remote add -f Bproject /path/to/B
$ git merge -s ours --no-commit Bproject/master
$ git read-tree --prefix=dir-B/ -u Bproject/master
$ git commit -m "Merge B project as our subdirectory"
You need to pay attention to the -f switch on git remote add. He tells the git to immediately fetch this remote.
Further, changes to Bproject are pulled by the git pull command with an explicit indication of the desired branch and merge strategy:
$ git pull -s subtree Bproject master
If the corresponding remote subtree is not added to the working copy, the name of the branch from which the changes are pulled is not shown in the history of the main repository.
The problem is purely cosmetic, and does not affect the work. It is treated by adding this remote to the working copy:
$ git remote add -f Bproject /path/to/B
In the future, if new branches appear, you can pull up the changes in remote:
$ git fetch Bproject
There are a couple more drawbacks:
- As in the case of the usual branch merging, in the commit logs the history of the subtree is mixed with the history of the main project.
- Submitting changes to a subtree project is much more difficult than with submodules. But it’s easy to get around, making changes to a separate clone of this project.
Working with subtrees is much more convenient than with submodules. You don’t need to retrain users for it, it’s easier to automate it. Subtrees are easier to maintain. Recommend.
By the way, there is a project at Github aimed at developing work with subtrees: git-subtree .
Additional reading: