Subversion vs. Git: Debunking Myths About Debunking Myths
Subversion vs. Git: Myths and Facts claims to dispel some myths about version control systems. I doubted their “facts” and checked some of them. The result of the check was the undermined credibility of the site, and a skeptical attitude to other statements.
Below is an example where they compare the size of the repository. Conclusion - the difference is not significant.
I was confused by the different number of commits, and actually different primary sources (who knows how they synchronize these repositories there). Also, I was not satisfied with the level of detail of the description of the process of obtaining these numbers.
So, let's start our experiment!
We have a local copy of the entire repository in svn format. This is a 213 MB folder that contains 79,758 files and 88 folders .
At this point, the repository has 39,864 commits. A working copy of the project consists of 1701 files and 160 folders.
This stage was the longest, dragged on for more than a day (about 32 hours). As a result, we have a git repository - a copy of the original svn repository (not quite a copy, but for us the differences are not significant).
Here I did a little stupid thing, it would be worth creating a repository without a working copy, but once again I was not ready to wait more than a day.
So, the git_from_svn / .git folder: 208 MB contains 7841 files and 509 folders . At this stage, we can really talk about a slight advantage in favor of git, apparently it was precisely at this stage that those who led the site stopped. But the git has 2 storage formats: “loose” object and “packfile".
Let's see what we have:
Result:
That is, we have many packages and 6826 (30.21 MB) non-compressed objects.
Result:
The size of the ".git" folder itself is 136 MB (945 files, 255 folders) . It seems to me that this advantage can hardly be called insignificant.
But this is not all, if you get rid of the svn legacy - pushing it all into the bare repository gives an even more interesting picture: 106 MB, 19 files, 8 folders .
It seems to me that at this stage the debunking of the myth can be considered debunked (the statement on the site is disproved).
It is also worth mentioning that you will still have loose objects, they are intended to increase the productivity of working with frequently used files. Typically, this format will store files from branches in which work is currently underway. Their number can be adjusted through configuration files.
And the experiment is "confirming."
Conclusion - the creation of brunch is faster than you have time to blink an eye. It seemed to me that if the whole operation takes less than 0.01 seconds, then there is nothing to compare. But for some reason, on the statement about the high cost of brunches in svn, the site checked only their creation. But there are other operations, such as cloning (or svn checkout). In this experiment, everything happens locally, a possible network effect is ruled out.
TotalSeconds: 14.0737539
TotalSeconds: 21.8173709
Git lost. But here it is worth considering, git got the whole story, and svn got one revision.
TotalSeconds: 4.3741352
TotalSeconds: 1.2700857
The opposite is true here, while on one switch we won back half of the loss during cloning.
On this I will probably end. I will go discuss these results with the staff.
PS: Despite the fact that the message of this note is banal "you should not trust untrusted sources on the Internet", but it turns out that there are still people who have not developed an adequate level of healthy skepticism.
Thanks for attention!
Update:
In the comments, VBKesha mentioned that it would be nice to make svnadmin pack.
214 MB (4,644 Files, 89 Folders) It
helped only with the number of files, the volume increased slightly.
A little about why I was interested
I changed my job relatively recently, got into a company that uses svn, but switching to git often pops up in discussions.
Once I witnessed a discussion of this topic. Colleagues discussed the same site and came to the conclusion that "we are changing the flea."
I was an involuntary listener in this dialogue, but what kind of site and its arguments interested me. Went to understand.
Once I witnessed a discussion of this topic. Colleagues discussed the same site and came to the conclusion that "we are changing the flea."
I was an involuntary listener in this dialogue, but what kind of site and its arguments interested me. Went to understand.
Let's start with the first statement
Git repositories are significantly smaller than equivalent Subversion ones
False. A myth.
The particular delta compression algorithms used in both version control systems differ in many details, but in general Subversion and Git store data in the same way. This results in the fact that Subversion and Git repositories with equivalent data will have approximately the same size. Except for the case of storing a lot of binary files, when Subversion repositories could be significantly smaller than Git ones (because Subversion's xdelta delta compression algorithm works both for binary and text files).
Below is an example where they compare the size of the repository. Conclusion - the difference is not significant.
I was confused by the different number of commits, and actually different primary sources (who knows how they synchronize these repositories there). Also, I was not satisfied with the level of detail of the description of the process of obtaining these numbers.
So, let's start our experiment!
Get the svn repository
svnrdump.exe dump https://core.svn.wordpress.org/ > svndump
svnadmin create svn
svnadmin load svn < svndump
We have a local copy of the entire repository in svn format. This is a 213 MB folder that contains 79,758 files and 88 folders .
At this point, the repository has 39,864 commits. A working copy of the project consists of 1701 files and 160 folders.
We start the migration to git
git svn clone -s --prefix "svn/" file:///%path%/svn git_from_svn
This stage was the longest, dragged on for more than a day (about 32 hours). As a result, we have a git repository - a copy of the original svn repository (not quite a copy, but for us the differences are not significant).
Here I did a little stupid thing, it would be worth creating a repository without a working copy, but once again I was not ready to wait more than a day.
So, the git_from_svn / .git folder: 208 MB contains 7841 files and 509 folders . At this stage, we can really talk about a slight advantage in favor of git, apparently it was precisely at this stage that those who led the site stopped. But the git has 2 storage formats: “loose” object and “packfile".
Let's see what we have:
git count-objects -v -H
Result:
count: 6826
size: 30.21 MiB
in-pack: 249852
packs: 25
size-pack: 145.51 MiB
prune-packable: 73
garbage: 0
size-garbage: 0 bytes
That is, we have many packages and 6826 (30.21 MB) non-compressed objects.
We optimize storage
git gc
git count-objects -v -H
Result:
count: 0
size: 0 bytes
in-pack: 256605
packs: 1
size-pack: 104.54 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes
The size of the ".git" folder itself is 136 MB (945 files, 255 folders) . It seems to me that this advantage can hardly be called insignificant.
We also clean it
But this is not all, if you get rid of the svn legacy - pushing it all into the bare repository gives an even more interesting picture: 106 MB, 19 files, 8 folders .
Putting it all together
svn - 213 MB (79 758 files, 88 folders)
svn (after pack) - 214 MB (4 644 files, 89 folders) (supplement after publication)
git svn - 208 MB (7 841 files, 509 folders)
git svn (pack ) - 136 MB (945 files, 255 folders)
git bare (pack) - 106 MB (19 files, 8 folders)
It seems to me that at this stage the debunking of the myth can be considered debunked (the statement on the site is disproved).
It is also worth mentioning that you will still have loose objects, they are intended to increase the productivity of working with frequently used files. Typically, this format will store files from branches in which work is currently underway. Their number can be adjusted through configuration files.
Move on
Branches are expensive in Subversion
False. A myth.
Branches in Subversion are implemented with Copy-On-Write strategy (referred to as 'Cheap Copies' in the svnbook). No matter how large a repository or project is, it takes a constant amount of time and space to make a branch. In fact, Subversion branches are extremely cheap beginning with version 1.0 and you can branch even for small bugfixes in a very busy and large project.
And the experiment is "confirming."
Conclusion - the creation of brunch is faster than you have time to blink an eye. It seemed to me that if the whole operation takes less than 0.01 seconds, then there is nothing to compare. But for some reason, on the statement about the high cost of brunches in svn, the site checked only their creation. But there are other operations, such as cloning (or svn checkout). In this experiment, everything happens locally, a possible network effect is ruled out.
The first experiment is cloning
svn checkout %local_path%/trunk
TotalSeconds: 14.0737539
git clone %local_path%
TotalSeconds: 21.8173709
Git lost. But here it is worth considering, git got the whole story, and svn got one revision.
Second experiment - brunch change
svn switch /branches/4.7
TotalSeconds: 4.3741352
git checkout -B "4.7" "origin/4.7"
TotalSeconds: 1.2700857
The opposite is true here, while on one switch we won back half of the loss during cloning.
On this I will probably end. I will go discuss these results with the staff.
PS: Despite the fact that the message of this note is banal "you should not trust untrusted sources on the Internet", but it turns out that there are still people who have not developed an adequate level of healthy skepticism.
Thanks for attention!
Update:
In the comments, VBKesha mentioned that it would be nice to make svnadmin pack.
svnadmin.exe pack svn
214 MB (4,644 Files, 89 Folders) It
helped only with the number of files, the volume increased slightly.