Profiling PHP scripts on a live server

    Good afternoon, community!

    Surely one of you has encountered such a problem: the site is slowly working on a real server.
    It is important to quickly find out in what places there are difficulties. You can not use xdebug for this, since it creates a large load on the server and itself introduces an error in the measurements. To solve this problem, we chose a system that allows you to quickly collect tree-like statistics on the site - pinba.

    On a habr already talked about specifics of work with pinba . If you have not read, you can read the link.

    For the impatient, I will immediately give a link to the result .
    Plus1 WapStart works in normal mode with a load of more than 1000 requests per second per instance.

    How does it all work?


    Data collection

    Pinba sends to its server (via UDP, very quickly) the start and end labels of the time interval (hereinafter referred to as timers) and stores the data in MySQL tables (easy to read). for instance
    $timer = pinba_timer_start(array('tag' => 'some_logic'));
    ....
    pinba_timer_stop($timer);
    

    To build a tree structure, we add 2 additional tags - tree_id (each time a unique id) and tree_parent_id is a tree_id from the timer in which the current one is embedded. for instance

    $parent_timer = pinba_timer_start(array('tag' =>'some_logic', 'tree_id' => 1, 'tree_parent_id' => 'root'));
       $child_timer = pinba_timer_start(array('tag' =>'child_logic', 'tree_id' => 2, 'tree_parent_id' => 1));
       pinba_timer_stop($child_timer);
    pinba_timer_stop($parent_timer);
    

    Thus, on the server, you can reproduce the nesting of timers and build a readable tree.
    We placed timers in all interesting places of the project in order to detect time (for example, with sql queries, when writing to files, etc.).

    Data preparation

    Unfortunately, pinba does not use indexes for queries (except PRIMARY), since it uses its own pinba ENGINE (tables are actually stored in memory, and data older than N minutes is deleted, in our case - 5 minutes). But you can not complain about pinba, as it is not intended for index queries.
    For us, indexes are important, because we copy all the data from pinba tables to regular MyISAM tables.

    truncate table pinba_cache.request;
    truncate table pinba_cache.tag;
    truncate table pinba_cache.timer;
    truncate table pinba_cache.timertag;
    insert ignore into pinba_cache.request select * from pinba.request;
    insert ignore into pinba_cache.tag select * from pinba.tag;
    insert ignore into pinba_cache.timer select * from pinba.timer;
    insert ignore into pinba_cache.timertag select * from pinba.timertag;
    

    As you can see from the requests, our system works in the pinba database, and a copy in the pinba_cache database.

    Also for work we need one more table, in which there will be tree_id and tree_parent_id fields.

    truncate table pinba_cache.timer_tag_tree;
    insert ignore into pinba_cache.timer_tag_tree
    SELECT * FROM (
    	SELECT null, timer_id, request_id, hit_count, timer.value, GROUP_CONCAT(timertag.value) as tags
    	, (select timertag.value from pinba_cache.timertag where timertag.timer_id=timer.id and tag_id = (select id from pinba_cache.tag where name='treeId')) as tree_id
    	, (select timertag.value from pinba_cache.timertag where timertag.timer_id=timer.id and tag_id = (select id from pinba_cache.tag where name='treeParentId')) as tree_parent_id
    	FROM pinba_cache.timertag force index (timer_id)
    	LEFT JOIN pinba_cache.timer ON timertag.timer_id=timer.id
    	where not tag_id in ((select id from pinba_cache.tag where name='treeId'), (select id from pinba_cache.tag where name='treeParentId'))
    	group by timertag.timer_id
    	order by timer_id
    ) as tmp
    GROUP BY tree_id;
    

    The structure of the timer_tag_tree table is shown below. The structure of the remaining tables is the same as in pinba.

    CREATE TABLE `timer_tag_tree` (
    	`id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
    	`timer_id` INT(10) NOT NULL DEFAULT '0',
    	`request_id` INT(10) NULL DEFAULT NULL,
    	`hit_count` INT(10) NULL DEFAULT NULL,
    	`value` FLOAT NULL DEFAULT NULL,
    	`tags` VARCHAR(128) NULL DEFAULT NULL,
    	`tree_id` VARCHAR(35) NOT NULL DEFAULT '',
    	`tree_parent_id` VARCHAR(35) NOT NULL DEFAULT '',
    	PRIMARY KEY (`id`),
    	INDEX `timer_id` (`timer_id`),
    	INDEX `tree_id_tree_parent_id` (`tree_id`, `tree_parent_id`),
    	INDEX `tree_parent_id_tree_id` (`tree_parent_id`, `tree_id`)
    )
    COLLATE='utf8_general_ci'
    ENGINE=MyISAM
    

    Data analysis

    Now - the most interesting. We collected the data, put it together as we need for subsequent work. Next, you need to write a script that will produce all this data in a convenient way.
    How to output one tree (from one request to the site) - I will not write, since this is a trivial task.
    The problem is that to assess bottlenecks, you need to analyze hundreds of php requests, each of which has its own tree of function calls (timers). We need to collect one generalized tree from these trees.

    The union algorithm is as follows:

    For each node, we consider the sum of the execution times of this node for all trees.
    Having written a function for combining two trees, you can cycle through all and get the sum.
    But here we are waiting for an unpleasant surprise - a slow time.
    As you can see from the picture, the difficulty of combining 2 trees is O (N * N) (attentive will tell me that you can do this in N * log (N), but then there will be a simpler optimization method, in 3 lines), where N -no nodes in the tree. Accordingly, it is beneficial to combine small trees, and very disadvantageously large.
    We will try to use this feature. Let's define the execution tree of one script as a level 1 tree, the sum of two trees of the first level is a tree of the second level, etc. In such terms, we need to combine more trees of the first level, and a minimum of a large level. We will do it this way:

    as we see, the total number of associations was N-1, of which N / 2 - the first level, N / 4 - the second level, N / 8 - the third, etc.
    This trick is implemented extremely simply with the help of recursion (if you wish, it can be expanded into a loop, but for the sake of clarity, I will not do this).

    //принимает на вход массив деревьев, на выход - объединенное дерево
    function mergeTreeList(array $treeList) {
        if (count($treeList) > 2) {
            return mergeTreeList( половина($treeList), вторая_половина($treeList));
        }
        //...
        //тут идет код объединения
    }
    

    Thus, we will first combine the original trees in 2x, and then they will be combined into larger trees. The time gain from us was ~ 10 times (1000 trees).

    Total


    • We placed pinba-timers in our application, where we considered it necessary
    • We formed an aggregated execution tree based on many script requests
    • Using the constructed tree, you can analyze the bottlenecks of the project, you can build graphs of the speed of individual pieces of the project
    • All this happens right on a live server with a lot of work.


    Pitfalls and Cons


    • On our project, pinba writes so quickly (and deletes the old one) that the insert into table_copy select * from table query copies 2-3 times more data than the table originally had. Therefore, while copying tables, you need to stop recording in pinba (I stopped the network on the server with a firewall)
    • Pinba consumes a lot of memory (we have 2 Gb to store data in 5 minutes), since instead of one tag we write 3 (+ tree_id, + tree_parent_id)
    • When copying, you have to disconnect the network to stop writing to tables (for 5-10 seconds), which causes data to be lost for these 5-10 seconds


    Useful files:
    Script to display the tree: index.php
    MySQL script for data conversion cron.sql
    PinbaClient.class.php - wrapper over pinba for more convenient use with automatic addition of tree_id, tree_parent_id
    I also want to mention the onphp framework , which has native support pinba
    https://github.com/ents/pinba-php-profiler/ - source files to raise everything at home
    http://pinba.org/ - here you can download pinba


    Disclaimer: This article is popular in nature and cannot be construed as a guide to action. All the actions described below are not the ultimate truth, but rather one of the few ways to make information visualized from pinba

    Also popular now: