DeusModus October 11, 2010 at 14:21

An approach to optimizing an application using the example of the popular CMS

An article can help your live project not slow down (or slow down less), or become a starting point for researching a third-party product.
For example, your task is to understand what is going on inside the “Samopisnaya system 3.14” and, again, help her not to eat 100 megabytes of RAM per client.

About the research program

WebAsyst Shop-Script is the second attempt of the guys from LLC Artikus to do well. The first attempt was full of holes and bringing many problems to this day Shop-Script Premium.
Strictly speaking, WebAsyst is a whole complex of programs such as notepad, calendar and project manager, but for a person who is not the first day in Internet development or business, these solutions are unlikely to be interesting (basecamp).
Whether their attempt was crowned with success or not, I can say so, just recently we celebrated the 666th revision of an alternative branch and this is not the end.

Goals

The goal is to identify the most resource-intensive operations and determine the state of the system with critical data volumes. By data I mean the number of categories and products. In some cases, recommendations will be made on optimization, but seeing the source of the trouble to neutralize it is not so difficult.

Preparation: directory structure and dependencies

Most recently, I asked on habravoprosy about a component for automatically generating dependencies between files and discovered for myself an extension that is friends with Graphviz, which I advise you, otherwise I’ll understand that almost the most important component of the program is located at
\ published \ SC \ html \ scripts \ modules \ test \ class.test.php
and why
published \ SC \ html \ scripts \ modules \ test \ _methods \ search_simple.php will be tedious and uninteresting.
I followed the path of grep, because I had the time and the need to know the essence, but I don’t want it anymore and I don’t advise you.
Especially considering their experiments with callback, from which I still feel nauseous.

Preparation: Content Filling

It would be foolish to argue that testing (unless it is an alpha development stage) on a small amount of data does not make sense. Therefore, first of all, fill your CMS with content to the eyeballs. If there is no real data, write a spammer who writes to the database for as long as possible or while you want this to happen.
WebAsyst SS behaves completely differently on catalogs with 450th and 4500th products.

Preparation: overloading standard functions

I will not talk about overload in terms of OOP, but overload in the forehead. The method is simple - we go through all the files in search of a standard function, replace it with ov_ {original_name} and here all the cards are open. We want, we log all database queries, we want, we see who and when to knock on fopen, fwrite, file_get_contents or trying to use black magic, like eval. The most useful logging is mysql_query, since performance usually rests on the backend.
I use something like this:

function ov_mysql_query($text) { $debug=false; if($debug){ $source=debug_backtrace(); $src_array_count=count($source); $what=array('\r\n','=',',','?','&','+',')','(','/'); $to=array('\r\n','_','_','_','_','_','_','_','_'); $filename=str_replace($what,$to,$_SERVER['REDIRECT_URL']); static $function_counter_m = 0; $function_counter_m++; $oldDir=getcwd(); chdir($_SERVER['DOCUMENT_ROOT']); $fp = fopen('logs/'.$filename.'.log', 'a'); fwrite($fp, $function_counter_m.') '.str_replace('\r\n','',trim($text))."\r\n"); for($i=0;$i<$src_array_count;$i++) { fwrite($fp,'DEBUG INFO:'.$source[$i]['file'].' | '.$source[$i]['line']."\r\n"); } fwrite($fp,"\r\n"); fclose($fp); chdir($oldDir); } $q=mysql_query($text); return $q; }

As a result of work, a file with debugging information (call stack) about the request and the request itself are saved in the www / logs folder.

Preparation: xDebug

To be honest, it’s hard for me to call debugging an attempt to figure out someone else’s mechanism. Rather, it is a preparation. However, whether or not you have a debugger directly depends on whether you can identify bottlenecks and optimize the system. If you write programs in php, then you need xDebug - it is free and it is supported by all at least a little self-respecting php code editors.

The debugger generates special dumps in the directory you set, in which various data are stored (optionally). Since the main OS I have is Windows, you can have an advantage at this step, since Linux kovcachegrind is much more convenient than wincachegrind (both programs allow you to view these dumps, although in truth these are ordinary txt files that can be read through notepad when due severity).

Let's start to slow down the zombies.

Test stand

Used version of WebAsyst : 287 (clean, without patches and mods) *
Number of products in the database : 4461
The number of types of characteristics in the database : 144
The number of characteristic values in the database : 44170
The number of categories in the database : 354
The number of pictures in the database : 3516

* but those who follow their changelog anyway, because from 250 it differs only in the color of the buttons in the subwindow of the admin window

A little bit about the initial configuration

The results of a clean engine on the default of one product and one category. The program contains regular caching mechanisms, but you can judge how much they justify themselves.

Page*	Database Queries with Default Cache	Database queries without default cache	Download speed with default cache **	Download speed without default cache **
home	64	73	10,304	17,011
Category	83	90	10,616	19,457
Product	100	107	15,010	28,731
Search (successful)	69	76	10,507	18,209

* the link leads to a screenshot of the page so that it is clear whether the requested data volumes are proportional to the displayed
** camulative time from Wincachegrind

If your hair has not stood on end, read on and do not forget that there was only one product and one category.

Data configuration

It is time to use our test bench with several thousand products and a powerful directory hierarchy.

Page	Database Queries with Default Cache	Database queries without default cache	Download speed with default cache	Download speed without default cache
home	64	73	12,323	19,404
Category	186	193	20,333	29,881
Product	108	115	16,156	30,100
Search (successful)	69	76	20,733	25,162
Selection by characteristics (advanced search)	900	907	43,216	50,242

If the main page still somehow remains with its 64 queries (IN statements (a, b, c, d, ..., z)), then the category is a bit sausage, and the selection by characteristics will destroy not just the usual hosting but also VPS. But you don’t think that disabling advanced search will help you? This software product has several undocumented features that in the hands of competitors can make life difficult.
You can learn about these features by digging into the class responsible for handling URl (class.furl.php). For example, you can non-stop hammering request store.ru/category/category_with_lot_products/ all /. I have 113 pages in this category at the top level.
Nameplate:

Page	Database Queries with Default Cache	Database queries without default cache	Download speed with default cache	Download speed without default cache
Category (/ all /)	241	248	430,049	439,102

Small subtotal

At the current stage of the study, we know:

It is possible to create a potentially high load
The number of database queries with and without a cache is too large

Also, if you look at the dump created by the debugger when loading the store.ru/category/category_with_lot_products page, you can confidently distinguish the two most gluttonous operations:

foreach ($Interfaces as $_Interface){ ModulesFabric::callInterface($_Interface); }

and

print $smarty->fetch($CurrDivision->MainTemplate);

In addition to them, a lot of resources are spent on getting a category tree, is_object is called more than 95 thousand times, the program asks LanguagesManager :: getInstance 70 thousand times and considers the length of the string more than 28 thousand, and LanguagesManager :: ml_isEmpty calls is 2/3 the slowest operations- getExtraParametrs.

Problem Solving Options

Easy

If you do not have many visitors, but the program slows down, you can use file caching with minimal integration time.
I suggest the following scheme:

Find a heavy function
Determine if it depends on any global variables
Rename it to something like {original_function} _cached
We create {original_function}, in the body of which it is called through a special function {original_function} _cached

In the early stages of optimization, when it was necessary for the program to work quickly and there was no time, I used this solution:

function cache_function($buildCallback, array $args = array(), $timeoutHours = 5){ $oldDir=getcwd(); chdir($_SERVER['DOCUMENT_ROOT']); // Устанавливаем имя кеш-файла if(is_array($buildCallback)){ $cacheKey = get_class($buildCallback[0]) .'::'. $buildCallback[1]; } else{ $cacheKey = $buildCallback . ':' . serialize($args); } $cacheKey .= ':' . serialize($args); if(!file_exists('functions_cache/'.$buildCallback.'/')) { @mkdir('functions_cache/'.$buildCallback.'/'); } $file_path = 'system_cache/'.$buildCallback.'/'. sha1($cacheKey); if(!file_exists($file_path) OR filemtime($file_path) < (time() - $timeoutHours*60)){ $result = call_user_func_array($buildCallback, $args); file_put_contents($file_path, serialize($result), LOCK_EX); }else{ $result = unserialize(file_get_contents($file_path)); } chdir($oldDir); return $result; }

We get:

function original_function($arg1,$arg2){ return cache_function('original_function_cached',array($arg1,$arg2),10); }

As a result, in the www / functions_cache / original_function_cached directory, the serialized result of executing the original_function_cached function will appear and will be used for 10 hours.

Difficult

No matter how we cache the results of the execution of functions, we still have a resource-intensive fetch that collects from a dozen templates using dozens of controllers and plug-ins a single page.
Here I would suggest optimizing the number of templates, create a normal hierarchy of them (by default, all templates are stored in bulk) and start moving towards block caching. Thus, on the most visited pages we will see a rather big increase in speed.

Very difficult

But, if you, like me, have no choice but to work with WA and you will work with it for a long time - these are all half measures.

It requires optimization, rewriting of algorithms (look at your leisure how they implement pagination) and not hack caching. In general, it’s easier for me in this regard, since I know that new content is added automatically at a certain time and at this time I can afford to reset the entire cache. In order to deal with the invalidation of prices and characteristics, you will have to configure cache groups and change a lot (from generating URLs to restructuring dermatories). In most problems, of course, you can deal with smarty, the program will have to be restructured with great fun, since WebAsyst SS itself does not seem to intend to use its caching mechanisms (everything speaks about this).

For example: we cached the entire page with the product, and set the life time to 5 hours. It is assumed that the price may change earlier, but you don’t really want to reset the cache. You can create a smarty-plugin that will turn to the desired method of the desired model (say $ productModel-> getPrice ($ pID)) and return the price. On the page with the goods we receive 1 query to the database. The view cache is not rebuilt.

Conclusion

Somehow it turned out for a long time, but everything seems to be in essence.
I hope that the ready-made solutions and recommendations from this article will lead you to something new (whether inclued or xDebug, or the rule not to take the word for the developers who say that all calls to the database from them go through the class) or help develop the old ones ideas.

Tags: