Several ways to optimize your work with Git
In our blog on Habré we talk about various technologies from the world of IaaS and not only. For example, we recently published material on VPN software implementations [ Part 1 ; Part 2 ], and also talked about DNS . Today we would like to delve into the topic of developing applications and services and talk about such a thing as Git, in particular, about ways to optimize the work with it. / photo hackNY.org CC
I would like to start from the very beginning - what is Git? Git is one of the version control systems (VCS), on the basis of which several services are built, such as GitHub or GitLab. With Git, a large amount of software was developed that is probably familiar to you: it is the Linux kernel, Firefox, and Chrome.
If you worked as a team on a software product, you can imagine how everything happens. You have a specific version of your project that you send to your colleagues. They make changes to the code and send them back. You embed them in your code base and get a new version of the project.
One of Git’s main tasks is to avoid confusion between product versions when files with names like project_betterVersion.kdenlive or project_FINAL-alternateVersion.kdenlive, etc. appear.
To simplify the work with these files, VCS systems are also needed. So each member of the team has the opportunity to work on the latest version of the project, make changes and report it to colleagues.
Control systems allow you to store several variations of the same document and, if necessary, "roll back" it to an earlier implementation. That is, you can make a copy of the repository and work with it locally, and then use special commands to embed your edits (push) in the main version or to extract (pull) the changes made by colleagues.
When working on large products, the source code is constantly being renamed, new branches are selected, and comparisons are made with previous versions. Therefore, in sufficiently large projects, there may be a decrease in the performance of Git. Such problems were once encountered even by Facebook.
Then they explained the difficulties in the work by the fact that with any change in the source files, the index file was overwritten, and in a large project its size exceeded 100 MB. This led to a slowdown (by the way, here is one interesting solution to another problem with the performance of Facebook version control systems proposed by the company's engineers).
To speed up work with Git, developers use various techniques, utilities and solutions. One option would be to reduce the size of the repository.
RoR developer Steve Lorek writes in his blog that he was able to reduce the size of the repository from 180 MB to 7 MB. To do this, he first created a local copy of Git, and then found files that took up too much storage space. Here the Anthony Stubbs bash script came to the rescue , which finds the 10 largest and most unnecessary files.
After that, he deleted these files using a series of commands:
After that, Steve sent the changes to the remote repository so that no one else had to download 180 megabytes for work.
This is another solution that will come in handy for organizations with several hundred developers. Many members of such teams work remotely and from different countries, which leads to delays in loading data from repositories. It sometimes comes to the fact that employees send each other hard drives by mail.
When mirroring, one or more active mirror servers are configured, which perform only reads of copies of repositories and synchronize with the main instance. This approach allows you to reduce the transmission of a copy of the repository on the 5 GB approximately 25 times.
Due to the fact that each developer stores the entire history of changes on his computer, the size of Git repositories is growing rapidly. However, there are a number of utilities that solve these problems. For example, git-annex allows you to store a symlink to it instead of the whole file.
Also worth noting is the Git Large File Storage ( Git LFS ) extension , which writes file pointers to the repository. Operations with these files are monitored using the clean and smudge filters , and their contents are stored on a remote GitHub.com or GitHub Enterprise server. You can find a description of several other utilities here .
This tip is not so much about Git performance and file upload speed, but about usability. Defining aliases can significantly increase the speed of work with Git and simplify many operations. Aliases are configured using the configuration file:
Interestingly, in this way you can create your own commands, which by default are not in the system, for example:
Specifically, in this case, you will be able to display the logs in a line and graphically with the git l command.
These small tips can help simplify work with large repositories and make life easier for development teams. And this is a big deal in terms of quality and speed of implementation of important projects of the company.
PS And we are also writing about the creation of our IaaS provider 1cloud:
I would like to start from the very beginning - what is Git? Git is one of the version control systems (VCS), on the basis of which several services are built, such as GitHub or GitLab. With Git, a large amount of software was developed that is probably familiar to you: it is the Linux kernel, Firefox, and Chrome.
If you worked as a team on a software product, you can imagine how everything happens. You have a specific version of your project that you send to your colleagues. They make changes to the code and send them back. You embed them in your code base and get a new version of the project.
One of Git’s main tasks is to avoid confusion between product versions when files with names like project_betterVersion.kdenlive or project_FINAL-alternateVersion.kdenlive, etc. appear.
To simplify the work with these files, VCS systems are also needed. So each member of the team has the opportunity to work on the latest version of the project, make changes and report it to colleagues.
Control systems allow you to store several variations of the same document and, if necessary, "roll back" it to an earlier implementation. That is, you can make a copy of the repository and work with it locally, and then use special commands to embed your edits (push) in the main version or to extract (pull) the changes made by colleagues.
Productivity increase
When working on large products, the source code is constantly being renamed, new branches are selected, and comparisons are made with previous versions. Therefore, in sufficiently large projects, there may be a decrease in the performance of Git. Such problems were once encountered even by Facebook.
Then they explained the difficulties in the work by the fact that with any change in the source files, the index file was overwritten, and in a large project its size exceeded 100 MB. This led to a slowdown (by the way, here is one interesting solution to another problem with the performance of Facebook version control systems proposed by the company's engineers).
To speed up work with Git, developers use various techniques, utilities and solutions. One option would be to reduce the size of the repository.
Reducing the repository
RoR developer Steve Lorek writes in his blog that he was able to reduce the size of the repository from 180 MB to 7 MB. To do this, he first created a local copy of Git, and then found files that took up too much storage space. Here the Anthony Stubbs bash script came to the rescue , which finds the 10 largest and most unnecessary files.
After that, he deleted these files using a series of commands:
$ git filter-branch --tag-name-filter cat --index-filter 'git rm -r --cached --ignore-unmatch filename' --prune-empty -f -- --all
$ rm -rf .git/refs/original
$ git reflog expire --expire=now –all
$ git gc --prune=now
$ git gc --aggressive --prune=now
After that, Steve sent the changes to the remote repository so that no one else had to download 180 megabytes for work.
Smart mirroring
This is another solution that will come in handy for organizations with several hundred developers. Many members of such teams work remotely and from different countries, which leads to delays in loading data from repositories. It sometimes comes to the fact that employees send each other hard drives by mail.
When mirroring, one or more active mirror servers are configured, which perform only reads of copies of repositories and synchronize with the main instance. This approach allows you to reduce the transmission of a copy of the repository on the 5 GB approximately 25 times.
A different approach to storing large files
Due to the fact that each developer stores the entire history of changes on his computer, the size of Git repositories is growing rapidly. However, there are a number of utilities that solve these problems. For example, git-annex allows you to store a symlink to it instead of the whole file.
Also worth noting is the Git Large File Storage ( Git LFS ) extension , which writes file pointers to the repository. Operations with these files are monitored using the clean and smudge filters , and their contents are stored on a remote GitHub.com or GitHub Enterprise server. You can find a description of several other utilities here .
Using aliases
This tip is not so much about Git performance and file upload speed, but about usability. Defining aliases can significantly increase the speed of work with Git and simplify many operations. Aliases are configured using the configuration file:
git config --global alias.co checkout
git config --global alias.br branch
git config --global alias.ci commit
git config --global alias.st status
Interestingly, in this way you can create your own commands, which by default are not in the system, for example:
git config --global alias.l "log --oneline --graph"
Specifically, in this case, you will be able to display the logs in a line and graphically with the git l command.
These small tips can help simplify work with large repositories and make life easier for development teams. And this is a big deal in terms of quality and speed of implementation of important projects of the company.
PS And we are also writing about the creation of our IaaS provider 1cloud: