Secrets of Git Lost Commits

Git is not that complicated, but flexible. Sometimes this flexibility leads to funny consequences. For example, look at this commit on GitHub. It looks like a normal commit, but if you clone your repository for yourself, you won’t find such a commit in it. Because it is a lost commit, better known as a git loose object or orphaned commit. Under the cut - a little about the insides of Git, where does this come from and what to do if you come across it.

How Git Keeps Commits


The Git repository uses simple key-value storage, where the key is the SHA-1 hash, and the value is a container of one of three types: description of the commit, description of the file tree, or file contents. There are even low-level plumbing commands for working with this repository as a database:

echo 'test content' | git hash-object -w --stdin


This architectural feature gave rise to a murky saying that Git tracks renaming by the contents of the file. When renaming the “commit” object, it will contain a link to the “file content” object, but if the content has not changed, it will be a link to the object already in the repository.


When a developer creates a commit, Git places one commit description object and a bunch of objects describing the file structure and contents of the files in the repository. Thus, “commits” are related Git objects in key-value storage.

By default, Git stores the entire contents of the files: if we changed the line in the 100-kilobyte source, an object with all 100 kilobytes compressed using zlib will be added to the repository. To prevent the repository from excessively swelling, Git provides a garbage collector that runs when the push command is executed, while objects are repackaged to a pack file that contains the difference between the original file and the next revision (diff).


When commits die


In some cases, a commit may not be needed. For example, the developer committed foo and then rolled back the change using the reset command. Git is designed in such a way that it does not delete commits immediately, giving the developer the ability to “turn back and forth” even the most destructive actions. A special reflog command allows you to view the operation log containing links to all changes to the repository.

But "unnecessary" commits happen not only when using the reset command. For example, the popular rebase operation simply copies information about commits, leaving in the repository an “original” that no one will need. To prevent such "lost" objects from accumulating in Git, a garbage collection mechanism is provided - the garbage collector already mentioned above, which is automatically called when the push command is executed or manually called.

Garbage collector searches for objects that are no longer referenced and removes them from storage. In this case, the reflog operations log plays a huge role: links in it have a limited lifetime, by default 30 days for an object without links and 90 days for an object with links. Garbage collector first removes all expired links from the reflog and then removes objects that are no longer referenced from the repository. This architecture gives the developer 30 days to restore the “unnecessary” commit, which otherwise would be permanently deleted from the repository after this period.

What happened on github?


I think you already guess. The specified commit was unnecessary: ​​most likely, the author made a rebase. But GitHub shows the contents of the server repository from which the push command is never executed. And the garbage collector, most likely, no one calls either. At the same time, when cloning such a repository, Git sends only those commits to which there are links over the network, and the “lost commits”, better known as loose objects, remain dead weight on the server side.

Hopefully this short digression into the git internals will save someone valuable time when searching for the “missing commits” referenced, for example, by a bug tracker. If I made a mistake somewhere or have comments, I will gladly talk in the comments.

Also popular now: