GitPHP on Badoo

    Badoo is a project with a giant git repository that has thousands of branches and tags. We use the highly modified GitPHP ( http://gitphp.org ) version 0.2.4, on which many add-ons have been made (including integration with our workflow in JIRA, organization of the review process, etc.). In general, this product suited us until we began to notice that our main repository opened for more than 20 seconds. And today we will talk about how we investigated the performance of GitPHP and what results we achieved by solving this problem.

    Timers


    When developing badoo.com in a development environment, we use a very simple debug-panel to set timers and debug SQL queries. Therefore, the first thing we did was redo it in GitPHP and began to measure the execution time of sections of code, not taking into account the nested timers. This is what our debug panel looks like:



    The first column contains the name of the method (or action) to be called, the second contains additional information: arguments to start, the beginning of the command output, and trace. In the last column is the time spent on the call (in seconds).

    Here is a short excerpt from the implementation of the timers themselves:

    timers, microtime(true));
        }
        public function timerStop($name, $value = null) {
            $timer = array_pop($this->timers);
            $duration = microtime(true) - $timer;
            // Вычтем потраченное время из всех таймеров, которые включают этот таймер
            foreach ($this->timers as &$item) $item += $duration;
            $this->Log($name, $value, $duration);
        }
    // ...
    }
    

    Using such an API is very simple. It is called timerStart()at the beginning of the measured code , at the end - timerStop()with the name of the timer and optional additional data:

    timerStart();
    $result = 0;
    $mult = 4;
    for ($i = 1; $i < 1000000; $i+=2) {
        $result += $mult / $i;
        $mult = -$mult;
    }
    $Log->timerStop("PI computation", $result);
    

    In this case, calls can be nested, and the above class will take this into account when calculating.

    For easier code debugging inside Smarty, we made "self-timers". They make it easy to measure the time spent working on methods with many exit points (many places where return is executed):

    name = $name;
                    GitPHP_Log::GetInstance()->timerStart();
            }
            public function __destruct() {
                    GitPHP_Log::GetInstance()->timerStop($this->name);
            }
    }
    

    Using such a class is very simple: you need to insert $Log = new GitPHP_DebugAutoLog(‘timer_name’); at the beginning of any function or method, and when you exit the function, its execution time will be automatically measured:

     5) {
            echo "Hello world!\n";
            sleep(5);
            return;
        }
        sleep(1);
    }
    

    Thousands of git calls cat-file -t


    Thanks to the set timers, we were quickly able to find where GitPHP version 0.2.4 spent most of the time. For each tag in the repository, one call was made git cat-file -t only to find out the type of commit, and whether this commit is a “lightweight tag” ( http://git-scm.com/book/en/Git-Basics-Tagging#Lightweight-Tags ) Lightweight tags in Git is the type of tag that is created by default and contains a link to a specific commit. Since no other tag types were present in our repository, we simply removed this check and saved a couple of thousand calls that git cat-file -t,took about 20 seconds.

    How did it happen that GitPHP needed to find out for each tag in the repository whether it is "lightweight"? Everything is pretty simple.

    On all GitPHP pages, next to the commit, branches and tags are displayed that point to it:



    For this, the class GitPHP_TagListhas a method that is responsible for obtaining a list of tags that link to the specified commit:

    GetHash();
                    if (!$this->dataLoaded) $this->LoadData();
                    $tags = array();
                    foreach ($this->refs as $tag => $hash) {
                            if (isset($this->commits[$tag])) {
                                    // ...
                            } else {
                                    $tagObj = $this->project->GetObjectManager()->GetTag($tag, $hash);
                                    $tagCommitHash = $tagObj->GetCommitHash();
                                    // ...
                                    if ($tagCommitHash == $commitHash) {
                                            $tags[] = $tagObj;
                                    }
                            }
                    }
                    return $tags;
            }
    // ...
    }
    

    Those. for each commit for which you want to get a list of tags, the following is true:

    1. The first call loads a list of all tags in the repository (call LoadData ()).
    2. Iterates over the list of all tags.
    3. For each tag, the corresponding object is loaded.
    4. GetCommitHash () is called on the tag object and the resulting value is compared with the searched one.

    Besides the fact that you can first map the view array( commit_hash => array(tags) ), you need to pay attention to the method GetCommitHash(): it calls a method Load($tag)that, when implemented using an external Git utility, does the following:

    GetHash();
                    $ret = trim($this->exe->Execute($tag->GetProject()->GetPath(), GIT_CAT_FILE, $args));
                    if ($ret === 'commit') {
    // ...
                            return array(/* ... */);
                    }
    // ...
                    $ret = $this->exe->Execute($tag->GetProject()->GetPath(), GIT_CAT_FILE, $args);
    // ...
                    return array(/* ... */);
            }
    }
    

    Those. To show which branches and tags are included in a commit, GitPHP loads a list of all tags and calls git cat-file -t for each of them. Not bad, Christopher, keep it up!

    Hundreds of git rev-list calls --max-count = 1 ...


    The situation with commit information is similar. To load a date, a message from a commit, an author, etc., each time git rev-list was called - max-count = 1 .... This operation is also not free:

    GetHash();
                    $ret = $this->exe->Execute($commit->GetProject()->GetPath(), GIT_REV_LIST, $args);
    // ...
                    return array(
    // ...
                    );
            }
    // ...
    }
    

    Solution: batch upload commits (git cat-file --batch)


    In order not to make many single calls to git cat-file, Git allows you to download many commits at once using the --batch option. In doing so, it takes a list of commits in stdin, and writes the result to stdout. Accordingly, you can first write to the file all the hashes of the commits that we need, run git cat-file --batch and download all the results at once.

    Here is an example of the code that does this (the code is given for version GitPHP 0.2.4 and operating systems of the * nix family):

    Execute(GIT_CAT_FILE, array('--batch', ' < ' . escapeshellarg($hashlistfile), ' > ' . escapeshellarg($outfile)));
            unlink($hashlistfile);
            $fp = fopen($outfile, 'r');
            unlink($outfile);
            $types = $contents = array();
            while (!feof($fp)) {
                $ln = rtrim(fgets($fp));
                if (!$ln) continue;
                list($hash, $type, $n) = explode(" ", rtrim($ln));
                $contents[$hash] = fread($fp, $n);
                $types[$hash] = $type;
            }
            return array('contents' => $contents, 'types' => $types);
        }
    // ...
    }
    

    We began to use this function for most of the pages where information about commits is displayed (i.e. we collect a list of commits and load them all with one call git cat-file --batch). This optimization reduced the average page load time from over 20 seconds to 0.5 seconds. Thus, we solved the problem of the slow work of GitPHP in our project.

    Open-source: GitPHP optimizations 0.2.9 (master)


    After a little thought, we realized that it was possible not to rewrite all the code for use git cat-file --batch. Although not reflected in the documentation, this command allows you to download information one commit at a time, without losing performance! During operation, one line is read from standard input and the results are sent to standard output without buffering. This means that we can open git cat-file --batchthrough proc_open()and receive results immediately, without redoing the architecture!

    Here is an excerpt from the implementation (for readability, error handling has been removed):

    GetProcess($projectPath);
                    $pipes = $process['pipes'];
                    $data = $hash . "\n";
                    fwrite($pipes[0], $data);
                    fflush($pipes[0]);
                    $ln = rtrim(fgets($pipes[1]));
                    $parts = explode(" ", rtrim($ln));
                    list($hash, $type, $n) = $parts;
                    $contents = '';
                    while (strlen($contents) < $n) {
                            $buf = fread($pipes[1], min(4096, $n - strlen($contents)));
                            $contents .= $buf;
                    }
                    return array(
                            'contents' => $contents,
                            'type' => $type,
                    );
            }
    // ...
    }
    

    Given that we are now able to quickly download content objects without making each time a call git commands, get a big performance boost was simple: you just change all calls git cat-file, and git rev-listthe call of our optimized functions.

    We collected all the changes in a single commit and sent a pull-request to the GitPHP developer. After some time, the patch was accepted! Here is this commit:

    source.gitphp.org/projects/gitphp.git/commitdiff/3c87676b3afe4b0c1a1f7198995cecc176200482

    The author made some corrections to the code (in separate commits), and now in the master branch there is a significantly accelerated version of GitPHP! To use optimizations, you need to turn off the "compatibility mode", that is, put $compat = false;in the configuration.

    YuriyouROCK Nasretdinov, PHP developer, Badoo
    Evgeny eZH Makhrov, QA engineer, Badoo

    Also popular now: