AloneCoder September 15, 2016 at 14:37

OPCache Extension Extension for PHP Overview

Transfer

PHP is a scripting language that compiles by default the files you need to run. During compilation, it extracts the opcodes , executes them, and then immediately destroys them. PHP was designed this way: when it proceeds to execute the R request, it “forgets” everything that was performed during the R-1 request.

On production servers, it is very unlikely that the PHP code will change between multiple requests. So we can assume that during compilation the same source code is always read, and therefore the opcode will be exactly the same. And if you extract it for each script, you get a waste of time and resources.

Due to the long compilation time, extensions were developed for caching opcodes. Their main task is oncecompile each PHP script and cache the resulting opcodes in shared memory so that every PHP workflow from your production pool can read and execute them (PHP-FPM is usually used).

As a result, the overall performance of the language is greatly increased, and it takes at least half as much time to run the script (it depends very much on the script itself). Usually even less because PHP does not need to compile the same scripts again and again.

The more complex the application, the higher the efficiency of this optimization. If a program launches a bunch of files, for example, a framework-based application, or products like Wordpress, then the duration of the script execution may decrease by 10-15 times. The fact is that the PHP compiler is slow because it has to convert one syntax to another, it tries to understand what you wrote and somehow optimize the resulting code in order to speed up its execution. So yes, the compiler is slow and consumes a lot of memory. With the help of profilers like Blackfire, we can predict the duration of the compilation.

Introduction to OPCache

The source code for OPCache was opened in 2013, and it began to come in the package with PHP 5.5.0. Since then, it has been the standard solution for caching opcodes in PHP. Here we will not consider other solutions, because of them I am familiar only with APC, whose support was discontinued in favor of OPCache. In short: if you used APC before, now use OPCache. Now this is officially the recommended solution by PHP developers for opcode caching tasks. Of course, if you want, you can use other tools, but never activate at the same time more than one extension for caching opcodes . This will surely bring down PHP.

Also keep in mind that further development of OPCache will be carried out only within the framework of PHP 7, but not PHP 5. In this article we will consider OPCache for both versions, so that you will see the difference (it is not too big).

So, OPCache is an extension, or rather, a zend extension, which has been implemented in PHP source code since version 5.5.0. It must be activated using the normal activation process via php.ini. As for distributions, check the manual to make friends with PHP and OPCache.

Two functions of one product

OPCache has two main functions:

Opcode caching.
Optimize opcodes.

Since OPCache runs the compiler to retrieve and cache the codes, it can use this step to optimize them. In essence, we are talking about a variety of compiler optimizations. OPCache works as a multi-pass compiler optimizer.

OPCache internals

Let's see how OPCache works inside. If you want to check the code, you can take it, for example, from here .

The idea of caching opcodes will be easy to understand and analyze. You will need a good understanding of the work and architecture of the Zend engine, and you will immediately begin to notice where optimization can be done.

Shared memory models

As you know, in different operating systems there are many models of shared memory. Modern Unix systems use several approaches to the general use of memory by processes, the most popular of which are:

System-V shm API
POSIX API
mmap API
Unix socket API

OPCache can use the first three if your OS supports them. The ini -setting opcache.preferred_memory_model explicitly sets the desired model. If you leave the parameter zero, then OPCache will select the first model that works on your platform, sequentially iterating over the table:

static const zend_shared_memory_handler_entry handler_table[] = {
#ifdef USE_MMAP
    { "mmap", &zend_alloc_mmap_handlers },
#endif
#ifdef USE_SHM
    { "shm", &zend_alloc_shm_handlers },
#endif
#ifdef USE_SHM_OPEN
    { "posix", &zend_alloc_posix_handlers },
#endif
#ifdef ZEND_WIN32
    { "win32", &zend_alloc_win32_handlers },
#endif
    { NULL, NULL}
};

By default, mmap should be used . This is a good model, developed and sustainable. Although it is less informative for system administrators than the System-V SHM model, as well as its ipcsand commands ipcrm.

As soon as OPCache starts (that is, PHP starts), it checks the shared memory model and allocates one large segment, which will then be distributed in parts. At the same time, the segment will no longer be freed or resized.

That is, when starting PHP, OPCache allocates one large segment of memory, which is not freed or fragmented.

The segment size can be set in megabytes using the opcache.memory_consumption INI setting . Do not save, ask more. Never run out of shared memoryif this happens, then the processes will be blocked. We will talk about this below.

Set the segment size according to your needs, and do not forget that a production server dedicated to PHP processes can consume several tens of gigabytes of memory for PHP alone. So often allocate 1 GB or more for a segment, it all depends on specific needs. If you use a modern application stack, based on a framework, with a large number of dependencies, etc ... here you can not do at least without a gigabyte.

The segment will be used by OPCache for several tasks:

Caching the script data structure, including caching opcodes.
Create a shared interned string buffer.
Storing a hash table of cached scripts.
Storing the state of OPCache global shared memory.

Remember that the shared memory segment contains not only opcodes, but also other things necessary for OPCache to work. So estimate how much memory is needed, and set the desired segment size.

Opcode caching

Consider the details of the caching mechanism.

The idea is to copy into the shared memory (shm, shared memory) the data of each pointer, which does not change from request to request, that is, immutable data. A lot of them. After loading a previously used script from the shared memory, the pointer data is restored to the standard memory of the process, tied to the current request. A working PHP compiler uses the Zend Memory Manager (Zend Memory Manager, ZMM) to place each pointer. This type of memory is tied to the request, so ZMM will try to automatically free pointers at the end of the current request. In addition, these pointers are placed from the heap of the current request, so it turns out to be something like private extended memory, which cannot be shared with other PHP processes. Therefore, the task of OPCache is to look at each structure, returned by the PHP compiler so as not to leave a pointer allocated to this pool, but copy it to the allocated shared memory pool. And here we are talking about compilation time. Everything that was posted by the compiler is considered immutable. Mutable data will be createdthe Zend virtual machine at run time, so you can safely save everything created by the Zend compiler to shared memory. For example, functions and classes, pointers to function names, pointers to OPArray functions, class constants, names of declared class variables and, finally, their default content ... A lot of things are created in memory by the PHP compiler.

This model is used to reliably prevent locks. Later we will touch on the topic of blocking. In essence, OPCache does all its work right away, before it is executed, so there is nothing to do already during the execution of the OPCache script. Variable data will be created in the classic "heap" of the process using ZMM, and immutable data will be restored from shared memory.

So, OPCache connects to the compiler and replaces the structure that the latter should fill in during the compilation of the scripts with its own. Then, instead of directly populating the Zend engine tables and internal structures, it forces the compiler to populate the structure persistent_script.

Here she is:

typedef struct _zend_persistent_script {
    ulong          hash_value;
    char          *full_path;              /* полный путь с разрешёнными симлинками */
    unsigned int   full_path_len;
    zend_op_array  main_op_array;
    HashTable      function_table;
    HashTable      class_table;
    long           compiler_halt_offset;   /* позиция __HALT_COMPILER или -1 */
    int            ping_auto_globals_mask; /* какие autoglobal’ы использованы скриптом */
    accel_time_t   timestamp;              /* время модифицирования скрипта */
    zend_bool      corrupted;
#if ZEND_EXTENSION_API_NO < PHP_5_3_X_API_NO
    zend_uint      early_binding;          /* линкованный список отложенных объявлений */
#endif
    void          *mem;                    /* общая память, использованная структурами скрипта */
    size_t         size;                   /* размер использованной общей памяти */
    /* Все записи, которые не должны учитываться в контрольной сумме ADLER32,
     * должны быть объявлены в этом struct
     */
    struct zend_persistent_script_dynamic_members {
        time_t       last_used;
        ulong        hits;
        unsigned int memory_consumption;
        unsigned int checksum;
        time_t       revalidate;
    } dynamic_members;
} zend_persistent_script;

And so OPCache replaces the compiler structure with its own persistent_script, simply by switching function pointers:

new_persistent_script = create_persistent_script();
/* Сохраняет исходное значение op_array, таблицу функции и таблицу класса */
orig_active_op_array = CG(active_op_array);
orig_function_table = CG(function_table);
orig_class_table = CG(class_table);
orig_user_error_handler = EG(user_error_handler);
/* Перекрывает их своими */
CG(function_table) = &ZCG(function_table);
EG(class_table) = CG(class_table) = &new_persistent_script->class_table;
EG(user_error_handler) = NULL;
zend_try {
    orig_compiler_options = CG(compiler_options);
    /* Конфигурирует компилятор */
    CG(compiler_options) |= ZEND_COMPILE_HANDLE_OP_ARRAY;
    CG(compiler_options) |= ZEND_COMPILE_IGNORE_INTERNAL_CLASSES;
    CG(compiler_options) |= ZEND_COMPILE_DELAYED_BINDING;
    CG(compiler_options) |= ZEND_COMPILE_NO_CONSTANT_SUBSTITUTION;
    op_array = *op_array_p = accelerator_orig_compile_file(file_handle, type TSRMLS_CC); /* Запускает PHP-компилятор */
    CG(compiler_options) = orig_compiler_options;
} zend_catch {
    op_array = NULL;
    do_bailout = 1;
    CG(compiler_options) = orig_compiler_options;
} zend_end_try();
/* Восстанавливает исходники */
CG(active_op_array) = orig_active_op_array;
CG(function_table) = orig_function_table;
EG(class_table) = CG(class_table) = orig_class_table;
EG(user_error_handler) = orig_user_error_handler;

As you can see, the PHP compiler is completely isolated and disconnected from normally populated tables. Now it fills the structure persistent_script. Next, OPCache should look at these structures and replace the request pointers with pointers to shared memory. OPCache needs:

Script functions.
Script classes
The main OPArray script.
Script path
The structure of the script itself.

The compiler is also given some options that disable the optimizations it performs, for example, ZEND_COMPILE_NO_CONSTANT_SUBSTITUTIONand ZEND_COMPILE_DELAYED_BINDING. This adds OPCache work. Remember that OPCache connects to the Zend engine, this is not a patch for the source code.

Since we now have a structure persitent_script, we must cache its information. The PHP compiler filled our structures, but with the help of ZMM allocated memory from the edge: it will be freed at the end of the current request. Then we need to look at this memory and copy the contents into the shared memory segment so that the collected information can be used for several queries, rather than being re-calculated each time.

The process is structured as follows:

The PHP script is cached and the total data size of each variable (all pointer targets) is calculated.
Already allocated shared memory reserves one large block of the same size.
All the structures of the script variables are scanned, and the data of the variables of all target objects of the pointers are copied to the newly reserved block of shared memory.
To download the script (when it comes to that) the exact opposite is done.

So, OPCache intelligently uses shared memory, never fragmenting it through releases and densifications. For each script, it calculates the exact size of the shared memory needed to store the information, and then copies the data there. Memory is never freed or returned to OPCache.Therefore, it is used extremely efficiently and is not fragmented. This greatly improves the performance of shared memory, because there is no linked-list or B-tree (BTree) that you have to store and view while managing memory that can be freed (as malloc / free does). OPCache saves data in the shared memory segment, and when they lose relevance (due to checking the relevance of the script), the buffers are not freed, but are marked as “wasted”. When the proportion of lost memory reaches its maximum, OPCache restarts. This model is very different, for example, from APC. Its great advantage is that performance does not drop over time, because the buffer from the shared memory is never managed (it is not freed, it is not compressed, etc.).OPCache was designed to provide the highest possible performance, taking into account the execution of the PHP environment. The “inviolability” of the shared memory segment also provides a very good processor cache hit rate (especially L1 and L2), because OPCache also aligns the memory pointers to L1 / L2.

Caching a script primarily involves calculating the exact size of its data. Here is the calculation algorithm:

uint zend_accel_script_persist_calc(zend_persistent_script *new_persistent_script, char *key, unsigned int key_length TSRMLS_DC)
{
    START_SIZE();
    ADD_SIZE(zend_hash_persist_calc(&new_persistent_script->function_table, (int (*)(void* TSRMLS_DC)) zend_persist_op_array_calc, sizeof(zend_op_array) TSRMLS_CC));
    ADD_SIZE(zend_accel_persist_class_table_calc(&new_persistent_script->class_table TSRMLS_CC));
    ADD_SIZE(zend_persist_op_array_calc(&new_persistent_script->main_op_array TSRMLS_CC));
    ADD_DUP_SIZE(key, key_length + 1);
    ADD_DUP_SIZE(new_persistent_script->full_path, new_persistent_script->full_path_len + 1);
    ADD_DUP_SIZE(new_persistent_script, sizeof(zend_persistent_script));
    RETURN_SIZE();
}

I repeat: we need to cache:

Script functions.
Script classes
The main OPArray script.
Script path
The structure of the script itself.

The iterative algorithm performs a deep search for functions, classes, and OPArray: it caches the data of all pointers. For example, in PHP 5, for functions, you need to copy to shared memory (shm):

Function Hash Tables
1. Function Hash Table Container Table (Bucket **)
2. Function Hash Table Container (Bucket *)
3. Function hash table container key (char *)
4. Function Hash Table Container Data Pointer (void *)
5. Function Hash Table Container Data (*)
OPArray Functions
1. OPArray File Name (char *)
2. OPArray literals (names (char) and values (zval) )
3. OPArray opcodes (zend_op *)
4. Function Names OPArray function name (char *)
5. arg_infos OPArray (zend_arg_info, as well as class name and class name are both char )
6. Array break-continue OPArray (zend_brk_cont_element *)
7. Static Variables OPArray (Full hash table and zval *)
8. OPArray documentation comments (char *)
9. Array try-catch OPArray try- (zend_try_catch_element *)
10. Compiled OPArray variables (zend_compiled_variable *)

In PHP 7, the list is slightly different due to the difference in structures (e.g. hash tables). As I said, the idea is to copy the data of all pointers into shared memory. Since deep copying may involve intersecting structures, OPCache uses a translate table to store pointers: each time the pointer is copied from the normal memory associated with the request to the shared memory, the connection between the old and new pointer addresses is written to the table. The process responsible for copying first looks in the translation table for whether this data has already been copied. If copied, then it uses the old pointer data so that there is no duplication:

void *_zend_shared_memdup(void *source, size_t size, zend_bool free_source TSRMLS_DC)
{
    void **old_p, *retval;
    if (zend_hash_index_find(&xlat_table, (ulong)source, (void **)&old_p) == SUCCESS) {
        /* we already duplicated this pointer */
        return *old_p;
    }
    retval = ZCG(mem);;
    ZCG(mem) = (void*)(((char*)ZCG(mem)) + ZEND_ALIGNED_SIZE(size));
    memcpy(retval, source, size);
    if (free_source) {
        interned_efree((char*)source);
    }
    zend_shared_alloc_register_xlat_entry(source, retval);
    return retval;
}

ZCG(mem)represents a segment of shared memory of a fixed size, filled as elements are added. Since it is already allocated, there is no need to allocate memory for each copy (this would reduce the overall performance), just when filling in the segment, the boundary of the pointer addresses is shifted.

We examined a script caching algorithm that takes a pointer and data from a heap tied to a request, and then copies them to shared memory, if this has not been done before. The loading algorithm does the exact opposite: it takes from shared memorypersistent_scriptand scans all its dynamic structures, copying common pointers to pointers located in process-bound memory. After that, the script is ready to be launched using the Zend engine (Zend Engine Executor), now it does not embed addresses of common pointers (which will lead to serious bugs when one script changes the structure of another). Now Zend is deceived by OPCache: he did not notice what happened before the execution of the script, the substitution of pointers.

The process of copying from normal memory to shared memory (script caching) and vice versa (script loading) is well optimized, and even if you have to perform a lot of copying or hash searches, which does not improve performance, it still turns out much faster than running a PHP compiler every time .

Sharing internal string storage

The interned strings are a good memory optimization introduced in PHP 5.4. This looks logical: when PHP encounters a string (char *), it stores it in a special buffer and uses the pointer again each time it encounters the same string. You can learn more about them from this article .

They work like this:

All pointers use the same string instance. But there is one problem: the buffer of this internal line is used separately for each process and is mainly controlled by the PHP compiler. This means that in the PHP-FPM pool, each PHP workflow will save its own copy of this buffer. Like that:

This leads to large memory losses, especially when you have a lot of workflows, and when you use very large strings in your code (hint: explanatory comments in PHP are strings).

OPCache shares this buffer between all workflows in the pool. Something like this:

OPCache uses a shared memory segment to store all of these shared buffers. Therefore, when you assign a segment size, you must also consider your use of the internal row storage. Using the INI configuration of opcache.interned_strings_buffer, you can configure how shared memory is used for storage. Let me remind you once again: make sure that you have enough memory allocated. If you run out of space for these lines ( opcache.interned_strings_buffer value is too low), then OPCache will not restart. After all, he still has enough free shared memory, only the string storage buffer is full, which does not block the processing of the request. You simply cannot save and share strings, and strings using the memory of the PHP workflow will also be inaccessible. It is better to avoid such situations so as not to reduce performance.

Check the logs: when you run out of memory for this, OPCache will warn about this:

if (ZCSG(interned_strings_top) + ZEND_MM_ALIGNED_SIZE(sizeof(Bucket) + nKeyLength) >=
        ZCSG(interned_strings_end)) {
        /* память кончилась, возвращается та же несохраненная строка*/
        zend_accel_error(ACCEL_LOG_WARNING, "Interned string buffer overflow");
        return arKey;
    }

Such lines include almost all kinds of lines that are encountered by the PHP compiler during its operation: variable names, “php lines”, function names, class names ... Comments, which are called “annotations” today, are also lines, and most often huge size. They occupy most of the buffer, so do not forget about them.

Locking mechanism

Since we are talking about shared memory, we should also talk about memory blocking mechanisms. The essence is this: every PHP process that wants to write to shared memory will block all other processes that also want to write to it . So the main difficulties are related to writing, not reading. You can have 150 PHP processes reading from shared memory, but at the same time only one can write to it. A write operation does not block reads, but only other write operations.

So there should be no deadlocks in OPCache until you want to warm your cache sharply. If, after deploying the code, you will not regulate the traffic to the server, then the scripts will begin to be intensively compiled and cached. And since the operation of writing the cache to the shared memory is subject to an exclusive lock, then you will get up all the processes, because some lucky guy started to write to the memory and blocked everyone else. And when he releases the lock, then all other processes that are waiting in line will find that the file they just compiled is already stored in shared memory. And then they will begin to destroy the compilation result in order to load data from shared memory. This is an unforgivable waste of resources.

/* эксклюзивная блокировка */
zend_shared_alloc_lock(TSRMLS_C);
/* Проверьте, нужно ли положить файл в кэш (может быть, он уже туда положен
 * другим процессом. Эта заключительная проверка выполняется при 
 * эксклюзивной блокировке) */
bucket = zend_accel_hash_find_entry(&ZCSG(hash), new_persistent_script->full_path, new_persistent_script->full_path_len + 1);
if (bucket) {
    zend_persistent_script *existing_persistent_script = (zend_persistent_script *)bucket->data;
    if (!existing_persistent_script->corrupted) {
        if (!ZCG(accel_directives).revalidate_path &&
            (!ZCG(accel_directives).validate_timestamps ||
             (new_persistent_script->timestamp == existing_persistent_script->timestamp))) {
            zend_accel_add_key(key, key_length, bucket TSRMLS_CC);
        }
        zend_shared_alloc_unlock(TSRMLS_C);
        return new_persistent_script;
    }
}

You need to disconnect the server from external traffic, deploy new code and pull the heaviest URLs with curl so that curl requests gradually fill up the shared memory. When you finish with most of your scripts, you can send traffic to the server, and then active reading from the shared memory will begin, and this does not lead to locks. Of course, there may remain small scripts that have not yet been compiled, but since there are few of them, this will have little effect on recording blocking.

Avoid during the writing of PHP files and their subsequent use. The reason is the same: when you write a new file to the root folder of a production server and then use it, it is likely that thousands of work processes will try to compile and cache it in shared memory. And then there will be a lock. Dynamically generated PHP files must be added to the OPCache blacklist using the INI setting opcache.blacklist-filename (it accepts masks (glob pattern)).

Formally, the locking mechanism is not very strong, but it is found in many varieties of Unix - it uses the famous call fcntl():

void zend_shared_alloc_lock(TSRMLS_D)
{
    while (1) {
        if (fcntl(lock_file, F_SETLKW, &mem_write_lock) == -1) {
            if (errno == EINTR) {
                continue;
            }
            zend_accel_error(ACCEL_LOG_ERROR, "Cannot create lock - %s (%d)", strerror(errno), errno);
        }
        break;
    }
    ZCG(locked) = 1;
    zend_hash_init(&xlat_table, 100, NULL, NULL, 1);
}

We talked about memory locks that occur during normal processes: if you make sure that only one process writes to the shared memory at a time, then you will not have problems with locks.

But there is another kind of blockage that needs to be avoided: memory depletion. This is the subject of the next chapter.

OPCache memory consumption

As you remember:

When you start PHP (when you start PHP-FPM), OPCache creates one unique segment of shared memory, used for different needs.
Within this segment, OPCache never frees memory. The segment is filled as necessary.
OPCache locks shared memory while writing.
Shared memory is used for:
1. Caching script data structures, including opcode caching.
2. Create a buffer for shared internal string storage.
3. A hash table storage of cached scripts.
4. OPCache global shared memory state storage.

If you use script checking, OPCache will check the date of their change at every access (you can do it not at every one, change the INI setting of opcache.revalidate_freq ) and tell you how fresh the file is. This check is cached: it is not as expensive as you think. Sometimes, after PHP, OPCache enters the scene, and PHP already defines ( stat()) a file: then OPCache reuses this information, and for its own needs does not make the “expensive” stat()file system call again .

If you use timestamp checking with opcache.validate_timestamps and opcache.revalidate_freqSince your file has actually changed, OPCache simply considers it invalid and assigns the “wasted” flag to all its data in shared memory. OPCache restarts only when it runs out of allocated shared memory AND when the proportion of lost memory reaches the value of the INI setting opcache.max_wasted_percentage INI. Avoid this in every way. There are no other options.

/* Вычисление необходимого объёма памяти */
memory_used = zend_accel_script_persist_calc(new_persistent_script, key, key_length TSRMLS_CC);
/* Выделение общей памяти */
ZCG(mem) = zend_shared_alloc(memory_used);
if (!ZCG(mem)) {
    zend_accel_schedule_restart_if_necessary(ACCEL_RESTART_OOM TSRMLS_CC);
    zend_shared_alloc_unlock(TSRMLS_C);
    return new_persistent_script;
}

The picture shows how a segment of shared memory might look after some time, when some of the scripts have changed. The memory of the modified scripts is marked as “lost”, and OPCache simply ignores it. He will also recompile the modified scripts and create a new memory segment to store their information.

When the amount of lost memory reaches a certain limit, a restart is performed. OPCache blocks shared memory, empties it, and unlocks it. This helps your server in situations where it has just started: each workflow tries to compile files, and therefore seeks to lock memory. Because of these locks, the server is very slow. The higher the load, the lower the performance, such is the unpleasant rule of locks. And it can go on for long seconds.

Therefore, never allow the exhaustion of shared memory.

In general, you need to disable tracking of script modification on the production server, then the cache will never restart (in fact, this is not entirely true: OPCache may still run out of space for the persistent script key, which we will discuss below). For classic deployments, the following rules must be observed:

Disconnect the server from the load (disconnect from the balancer).
Clear OPCache (call opcache_reset()) or close FPM directly (this is even better, but more on that below)).
Fully deploy the new version of the application.
Restart the FPM pool, if necessary, and gradually fill the new cache with curl requests to the main entry points of the application.
Again, send traffic to the server.

All this can be done using a shell script of 50 lines. If some heavy demands are not going to finish, the same script can be applied to him lsofand kill. Remember the features of Unix ;-)

You can also get an idea of what is happening using any of the many GUI frontends for OPCache. They all use the function opcache_get_status():

But the story is not over. You also need to remember well about cache keys .

When OPCache saves a cached script to shared memory, it saves it to a hash table so that you can later find this script. To index the hash table, OPCache must select a key. What key? It depends a lot on the configuration and architecture of your application.

Usually OPCache resolves the full path to the script. But be careful, because it uses realpath_cache , and this can hurt you. If you change the root folder using the symlink , set opcache.revalidate_path to 1 and clear the realpath cache (this can be difficult to execute because the cache is bound to the workflow that processes the current request).

So, OPCache resolves the full path to the file, while the realpath line is used as the cache key for the script . It is understood that the value of the opiache.revalidate_path INI setting is 1. If this is not the case, then OPCache will use the unresolved path as the cache key.. This will lead to problems if you used symlinks, because if you later changed the purpose of the symlink, OPCache will not notice this and will continue to use the unresolved path as a key to search for the old target script (to save the symlink resolving call) .

If opcache.use_cwd is set to 1, OPCache will add cwdat the beginning of each key. This is done by using relative paths to insert files, like require_once "./foo.php";. If you also use relative paths, and at the same time host several applications on the same PHP instance (which should not be done), then I suggest that you always set opcache.use_cwd to 1. In addition, if you used symlinks, assign one andopcache.revalidate_path . But all this will not save you from problems with the realpath cache . You can even change www-symlink for another purpose, OPCache will not notice this, even if you clear the cache with opcache_reset().

Due to the realpath cache, you may run into problems when using symlinks to process the root for deployment. Set opcache.use_cwd and opcache.revalidate_path to 1, but even so, bad symlink permissions may occur. For this reason, for realpath permission requests from OPCache, PHP gives the wrong answer coming from the mechanism realpath_cache.

If you want to reliably protect yourself during deployment, then first of all do not use symlinks to control documentroot. If you don’t have such a task, then use the dual FPM pool and the FastCGI balancer to balance the load between the two pools when deploying. As far as I remember, this feature is enabled by default in Lighttpd and Nginx:

Disconnect the server from the load (disconnect from the balancer).
Close FPM, this will kill PHP (and then OPCache). This will provide you with complete security, especially in connection with the realpath cache, which may mislead you. It will be cleaned when FPM closes. Track workflows that might be stuck and destroy them if necessary.
Deploy the new version of your application.
Restart the FPM pool. Remember to gradually populate the new cache with curl requests to the main entry points of the application.
Again, send traffic to the server.

If you do not want to disconnect the server from the balancer, which can be done later, then follow these steps:

Deploy your new code in a different folder, because the PHP server still has one active FPM pool and serves production requests.
Launch another FPM pool while listening to another port. The first pool should still be active and serve production requests.
Now you have two FPM pools: one is hot, the second is waiting for requests.
Change the target of the documentroot symlink to a new deployment path, and immediately after that stop the first FPM pool.

If your web server knows about both pools, then it will see that the first is dying, and will try to rebalance the traffic to the new pool, without interrupting the traffic and losing requests. After that, the second pool will start working, which will resolve the new documentroot-symlink (as long as it is fresh and has a clean realpath cache), and serve the new content. This action algorithm works well, I have used it many times on production servers. It is enough to write a shell script for 80 lines.

Depending on the settings, for one unique script OPCache can calculate several different keys. But the keystore is not infinite: it is also in shared memory and can fill up. In this case, OPCache will behave as if it has run out of memory at all, even if there is still enough space in the shared memory segment: a restart will be initiated for the next request.

Therefore, always keep track of the number of keys in the store, it should not be completely filled.
OPCache gives you this information when used opcache_get_status()- a function that different GUIs rely on - when the number of num_cached_keys is returned . I ’ll give you advice: preconfigure the number of keys with the INI setting opcache.max_accelerated_files. The configuration name does not mean the number of files, but the number of keys calculated by OPCache. As we saw, different keys can be calculated for one file. Track this parameter and use the correct value. Avoid relative paths in expressions require_once, otherwise OPCache will generate more keys. It is recommended that you use a well-configured autoloader to always make queries include_oncewith full paths rather than relative.

When launched, OPCache creates a hash table in memory for storing future persistent scripts, and never changes its size. If the hash table is full, it initiates a restart. This is done to improve performance.

Therefore, the number of num_cached_scripts may differ from num_cached_keys, from the OPСache status report. Only the num_cached_keys value is relevant . If it reaches max_cached_keys , then you will have a restart problem.

Do not forget, you can understand what is happening by lowering the OPCache log level (INI setting opcache.log_verbosity_level ). It will tell you if the memory runs out, and will tell you what OOM error has been generated (OutOfMemory): whether the shared memory or hash table is full.

static void zend_accel_add_key(char *key, unsigned int key_length, zend_accel_hash_entry *bucket TSRMLS_DC)
{
    if (!zend_accel_hash_find(&ZCSG(hash), key, key_length + 1)) {
        if (zend_accel_hash_is_full(&ZCSG(hash))) {
            zend_accel_error(ACCEL_LOG_DEBUG, "No more entries in hash table!");
            ZSMMG(memory_exhausted) = 1;
            zend_accel_schedule_restart_if_necessary(ACCEL_RESTART_HASH TSRMLS_CC);
        } else {
            char *new_key = zend_shared_alloc(key_length + 1);
            if (new_key) {
                memcpy(new_key, key, key_length + 1);
                if (zend_accel_hash_update(&ZCSG(hash), new_key, key_length + 1, 1, bucket)) {
                    zend_accel_error(ACCEL_LOG_INFO, "Added key '%s'", new_key);
                }
            } else {
                zend_accel_schedule_restart_if_necessary(ACCEL_RESTART_OOM TSRMLS_CC);
            }
        }
    }
}

To summarize memory usage:

When you start PHP, you start OPCache, it immediately places opcache.memory_consumption megabytes of shared memory (shm). Further this memory is used for the general storage of internal lines ( opcache.interned_strings_buffer ). Also in memory is a hash table for storing future persistent scripts and their keys. The amount of memory used depends on opcache.max_accelerated_files .

Now part of the shared memory is used by the internal components of OPCache, and you can fill the remaining volume with the data structures of your scripts. The memory segment will fill up as the scripts change and OPCache recompiles them (if you say so). The memory will gradually go into the category of “lost”, unless you tell OPCache not to recompile the modified scripts (recommended).

It might look like this:

If the persistent script hash table is full, or free memory runs out, then OPCache will restart (you need to avoid this at all costs).

Configuring OPCache

If you use a framework-based application, for example, Symfony, then I highly recommend:

Выключить на production механизм повторной проверки (присвоить opcache.validate_timestamps значение 0)
Развернуть приложение, используя полностью новый runtime скриптов. Это как раз тот случай с приложениями на базе Symfony.
Правильно настроить размер буферов:
1. opcache.memory_consumption. Это особенно важно.
2. opcache.interned_strings_buffer. Отслеживайте объём потребляемой памяти и в соответствии с ним настройте размер. Не забывайте: если вы скажете OPCache сохранять комментарии, которые наверняка будете писать, если используете «аннотации» PHP (opcache.save_comments = 1), что это тоже строки, причём большие, и они очень активно заполняют буфер хранилища.
3. opcache.max_accelerated_files. Заранее определённое количество ключей. Опять же, отслеживайте потребление и настраивайте в соответствии с ним.
Выключить opcache.opcache.revalidate_path и opcache.use_cwd. Это сэкономит объём в хранилище ключей.
Включить opcache.enable_file_override, это ускорит автозагрузчик.
Заполнить список opcache.blacklist_filename именами скриптов, которые вы наверняка сгенерируете в ходе runtime. Но их в любом случае не должно быть слишком много.
Выключить opcache.consistency_checks, по сути, это проверка сумы ваших скриптов, съедающая производительность.

After all these procedures, your memory should no longer be “lost”. In this case, then opcache.max_wasted_percentage is of little use . You will also need to turn off the main FPM instance during deployment. You can play with several FPM pools, as described above, so that there is no downtime of the service.

All of this should be enough.

OPCache Compiler Optimization

Introduction

So, we discussed caching opcodes in shared memory and loading them back. Just before caching them, OPCache can also run the optimizer several times. To understand how it works, you need to know how the Zend virtual machine executor works . If you are new to the issue of the compiler, you can start by reading articles on this topic . Or at least study the compulsory Reading Book of the Dragon . In any case, I will try to describe in a clear language and not too boring.

Basically, the optimizer gets the entire OPArray structure, which it can scan, find leaks, and fix them. But since we analyze opcodes during compilation, then we don’t have any hints regarding everything related to the “PHP variable”. We do not yet know what will be stored in the operands IS_VAR and IS_CV , we only know the future contents of IS_CONST and sometimes IS_TMP_VAR . As with any compiler of any language, we must create the structure that is most optimized for runtime execution so that everything goes as quickly as possible.

OPCache optimizer can optimize a lot of things in IS_CONST. We can also replace some opcodes with others (optimized for runtime); Using CGF analysis (graphs of control flows) we can find and delete non-executable pieces of code. But we still do not go through the cycles and do not take invariant code beyond them. We have other possibilities regarding the internal components of PHP: you can change the way class bindings can be made in order to optimize this process in some cases. But we have no way to perform cross-file optimization, because OPCache works with the OPArray files used when compiling (apart from OPArray other functions), and they are completely isolated. PHP has never been built on the basis of a cross-file virtual machine - both the language and the virtual machine are limited to a single file: during the compilation of the file, we have no information about the already compiled files and the next ones in the queue. Therefore, we are forced to try to optimize file by file, and should not assume, for example, that class A will be introduced in the future if it is not available now. This approach is very different from Java or C ++, which compile the entire “project” and can perform many cross-file optimizations. PHP can't do that.

The PHP compiler operates within a single file and does not have a common state for several file compilations. He compiles the project not entirely, but through the files, one by one. So there is simply no way to perform cross-file optimizations.

Optimization of OPCache can be applied only for some specific cases. The INI setting opcache.optimization_level is responsible for this . It is a mask of the desired optimizations based on binary values:

/* zend_optimizer.h */
#define ZEND_OPTIMIZER_PASS_1       (1<<0)   /* CSE, конструкция STRING */
#define ZEND_OPTIMIZER_PASS_2       (1<<1)   /* Постоянное преобразование и переходы */
#define ZEND_OPTIMIZER_PASS_3       (1<<2)   /* ++, +=, серия переходов */
#define ZEND_OPTIMIZER_PASS_4       (1<<3)   /* INIT_FCALL_BY_NAME -> DO_FCALL */
#define ZEND_OPTIMIZER_PASS_5       (1<<4)   /* оптимизация на базе CFG */
#define ZEND_OPTIMIZER_PASS_6       (1<<5)
#define ZEND_OPTIMIZER_PASS_7       (1<<6)
#define ZEND_OPTIMIZER_PASS_8       (1<<7)   
#define ZEND_OPTIMIZER_PASS_9       (1<<8)   /* использование TMP VAR */
#define ZEND_OPTIMIZER_PASS_10      (1<<9)   /* удаление NOP */
#define ZEND_OPTIMIZER_PASS_11      (1<<10)  /* Объединение одинаковых констант */
#define ZEND_OPTIMIZER_PASS_12      (1<<11)  /* Подгонка использованного стека */
#define ZEND_OPTIMIZER_PASS_13      (1<<12)
#define ZEND_OPTIMIZER_PASS_14      (1<<13)
#define ZEND_OPTIMIZER_PASS_15      (1<<14)  /* Сбор констант */
#define ZEND_OPTIMIZER_ALL_PASSES   0xFFFFFFFF
#define DEFAULT_OPTIMIZATION_LEVEL  "0xFFFFBFFF"

Known constant expressions and branch removal

Note that in PHP 5, many of the constant expressions known at compile time were NOT calculated by the compiler, but by OPCache. But in PHP 7 they are already computed by the compiler.

Example:

if (false) {
    echo "foo";
} else {
   echo "bar";
}

With classical compilation, we get:

Optimized compilation:

As you can see, the non-executable code in the branch if(false)was deleted, after which the Zend virtual machine simply launched the opcode ZEND_ECHO. This saved us memory, because we threw out a few opcodes. Perhaps the processor cycles during runtime also saved a little.

I remind you that we cannot yet know the contents of any variable, since we are still in the process of compiling (we are between compiling and executing). If we had IS_CV instead of the IS_CONST operand , then the code could not be optimized:

/* Это нельзя оптимизировать, что находится в $a ? */
if ($a) {
    echo "foo";
} else {
   echo "bar";
}

Take another example to show the difference between PHP 5 and PHP 7:

if (__DIR__ == '/tmp') {
    echo "foo";
} else {
   echo "bar";
}

In PHP 7, a constant value will be substituted __DIR__and the compiler will perform an identity check without OPCache. However, branch analysis and removal of non-executable code is performed while passing through the OPCache optimizer. In PHP 5.6, a constant value is also substituted __DIR__, but the compiler does not check for identity. This is later done by OPCache.

Summarize. If you run PHP 5.6 and PHP 7 with the OPCache optimizer activated, you will end up with the same optimized opcodes. But if you do not use the optimizer, then the code compiled in PHP 5.6 will be less efficient than in PHP 7, because the PHP 5.6 compiler does not perform any evaluations, while the PHP 7 compiler independently calculates many things (without involving the OPCache optimizer).

Preliminary Estimation of Constant Functions

OPCache can turn some IS_TMP_VAR into IS_CONST . In other words, it can independently compute some known values during compilation. Therefore, some functions can be performed already during compilation, if their results are constants. Here are some of the features:

function_exists()and is_callable(), only for internal functions.
extension_loaded()if disabled in user space dl().
defined()and constant(), only for internal constants.
dirname() if the argument is a constant.
strlen()and dirname()with constant arguments (only in PHP 7).

Take a look at an example:

if (function_exists('array_merge')) {
    echo 'yes';
}

If you turn off the optimizer, then the compiler will generate a lot of work for runtime:

Optimization is on:

Note that these functions do not calculate in user space. For example, the function: is

if function_exists('my_custom_function')) { }

not optimized, because you probably have defined your_custom_function in another file. Do not forget that the PHP compiler and OPCache optimizer work only file-by-file. Even if you do this:

function my_custom_function() { }
if function_exists('my_custom_function')) { }

This code will not be optimized, because the probability is too small, the optimizer of function calls works only for internal types (internal functions and constants).

Another example with dirname()(only for PHP 7):

if (dirname(__FILE__) == '/tmp') {
    echo 'yo';
}

No optimization: Optimization:

optimized

strlen()in PHP 7. If we connect them in a chain, then qualitative optimization will surely be performed. For instance:

if (strlen(dirname(__FILE__)) == 4) {
    echo "yes";
} else {
    echo "no";
}

Without optimization:

With optimization:

You may have noticed in the previous example that each expression was computed during compilation / optimization, and then the OPCache optimizer removed all the “fake” branches (assuming that the “correct” part was selected).

Transtyping

The OPCache optimizer can switch the types of your IS_CONST operand if it knows that the runtime will need to transtip them. This saves some CPU cycles during runtime:

$a = 8;
$c = $a + "42";
echo $c;

Classic compilation:

Optimized compilation:

Pay attention to the second true type of operation ZEND_ADD: it was switched from a string to an integer. The optimizer transformed the argument for the mathematical operation ADD. If he did not, then the runtime virtual machine would do it again, again, again, and again while the code is executing. So on transtypization we saved processor cycles.

Here is the OPCache optimizer code that does this work:

if (ZEND_OPTIMIZER_PASS_2 & OPTIMIZATION_LEVEL) {
    zend_op *opline;
    zend_op *end = op_array->opcodes + op_array->last;
    opline = op_array->opcodes;
    while (opline < end) {
        switch (opline->opcode) {
            case ZEND_ADD:
            case ZEND_SUB:
            case ZEND_MUL:
            case ZEND_DIV:
                if (ZEND_OP1_TYPE(opline) == IS_CONST) {
                    if (ZEND_OP1_LITERAL(opline).type == IS_STRING) {
                        convert_scalar_to_number(&ZEND_OP1_LITERAL(opline) TSRMLS_CC);
                    }
                }
                /* break missing *intentionally* - операция присваивания может оптимизировать только op2 */
            case ZEND_ASSIGN_ADD:
            case ZEND_ASSIGN_SUB:
            case ZEND_ASSIGN_MUL:
            case ZEND_ASSIGN_DIV:
                if (opline->extended_value != 0) {
                    /* операция над объектом с тремя состояниями – не пытайтесь оптимизировать! */
                    break;
                }
                if (ZEND_OP2_TYPE(opline) == IS_CONST) {
                    if (ZEND_OP2_LITERAL(opline).type == IS_STRING) {
                        convert_scalar_to_number(&ZEND_OP2_LITERAL(opline) TSRMLS_CC);
                    }
                }
                break;
    /* ... ... */

However, you should have noticed that this optimization was transferred to the PHP 7 compiler. This means that the PHP 7 compiler already performs this optimization even with OPCache disabled (or disabled optimization), as well as much more that was not performed by the PHP 5 compiler.

If you add two expressions IS_CONST , then the result can be calculated during compilation. In PHP 5, the compiler does not do this by default; you need the OPCache optimizer:

$a = 4 + "33";
echo $a;

Classic compilation:

Optimized compilation: The

optimizer calculated the result 4 + 33and erased the operation ZEND_ADD, replacing it directly with the result. This again saves the processor cycles during runtime, because the virtual machine executor now needs to do less work. I repeat: in PHP 7, this is done by the compiler, and in PHP 5, the OPCache optimizer is required.

Optimized opcode substitution

Let's now take a closer look at opcodes. Occasionally, you can substitute other, optimized ones for some opcodes.

$i = "foo";
$i = $i + 42;
echo $i;

Classical compiling:

Optimized compiling:

Knowledge of artist Zend VM Virtual Machine allows us to substitute ZEND_ASSIGN_ADDinstead of ZEND_ADDand ZEND_ASSIGN. This can be done for expressions like $i+=3;. ZEND_ASSIGN_ADDbetter optimized, it turns out one opcode instead of two (usually this is preferable, but not always)

On the same topic:

$j = 4;
$j++;
echo $j;

Classical compiling:

Optimized compiling:

Here optimizer OPCache Framed expression ++$iinstead $i++, because this piece of code, it has the same meaning. ZEND_POST_INC- not a very good opcode, because it must read the value, return it as it is, but increment the temporary value in memory, because it ZEND_PRE_INCuses the value itself: reads, increments and returns (this is the difference between pre-incrementing and post-incrementing). Since the intermediate value returned ZEND_POST_INCis not used in the above script, the compiler must issue an opcode to free it from memory ZEND_FREE. OPCache optimizer turns structure into ZEND_PRE_INCand removes uselessZEND_FREE: less work for runtime.

Constant substitution and precalculation

What about PHP constants? They are much more complicated than you might think. Therefore, some obvious, at first glance, optimizations are not done for a number of reasons. Let's look at an example:

const FOO = "bar";
echo FOO;

Optimized compilation:

This is part of the optimization of temporary variables. As you can see, the opcode was deleted, when compiling the optimizer, the result of reading the constant was calculated, so during runtime we need to do less work.

An ugly function define()can be replaced by an expression constif its argument is a constant:

define('FOO', 'bar');
echo FOO;

Non-optimized opcodes from this small script have a terrible effect on performance:

Optimized compilation:

define()ugly because it declares a constant, but does it during runtime, calling a function ( define()is a function). This is very bad. The keyword constleads us to the opcode DECLARE_CONST. You can read more about this in my article about the Zend virtual machine .

Multiple jump target resolution

It’s a bit more difficult to paint, but I’ll try to show with an example. Optimization concerns transition marks in transition opcodes (there are several types of them). Each time a virtual machine needs to make a transition, the transition address is calculated by the compiler and stored in the operand of the virtual machine. A transition is the result of a decision when a VM meets a decision point. There are many transitions in PHP scripts. if, switch, while, try, foreach, ?- all are expressions of decision-making. If the solution is true, then branch to branch A; otherwise, branch B.

Similar algorithms can be optimized if the goal of the transition is the transition itself. Then the landing transition will force the virtual machine to transition again, to the final landing transition. The resolution of multiple transition marks is to make the VM immediately jump to the endpoint of the “route”.

For instance:

if ($a) {
    goto a;
} else {
    echo "no";
}
a:
echo "a";

In the case of classical compilation, we get the following opcodes:

Translation (just read it): “the result of calculating $ a is 0, then go to goal 3, where we print“ no ”. Otherwise, continue execution and go to 4, where we display “a”. ”

It turns out something like "Go to 3, and from there go to 4". Why then not immediately "go to 4"? Here is what the optimization does:

It can be translated as “if the result of calculating $ a is not equal to zero, then go to 2 and print“ a ”, otherwise print“ no ””. Much easier, right? This optimization is especially effective in the case of very complex scripts, with a large number of decision-making levels. For example, whilein if, in which goto, leading to switch, executingtry-catches, etc. Without optimization, a generic OPArray can contain many opcodes. Many of them will be transitions, and one transition will lead to another. Inoda optimization can significantly reduce the number of opcodes (depending on the script) and simplify the path of the virtual machine. This results in a slight increase in performance during runtime.

Conclusion

I did not show all the features of the optimizer. For example, it can still optimize inline loops using “early returns”. It is also useful for embedding try-catch or switch-break blocks. Whenever possible, calls to PHP functions are optimized, creating a serious load on the engine.

The main difficulty with the optimizer is that it never changes the value of the script, and especially its control flow. Some time related bugs were discovered in OPCache, but it’s very unpleasant to watch how PHP behaves differently than expected when you run a small script you wrote ... In fact, the generated opcodes are changed by the optimizer, and the engine executes erroneous code. This is not great.

Today, the OPCache optimizer is very stable, but still under development for new versions of PHP. It needs to be well patched in PHP 7, since there have been many changes to the architecture of internal structures. You also need to force the PHP 7 compiler to perform much more optimizations (most of which are trivial) compared to PHP 5 (in fact, the PHP 5 compiler doesn’t optimize anything at all).

You may be wondering why all this is not done right away in the compiler. The thing is, we want to keep the compiler as safe as possible. It generates opcodes, which sometimes, not with every compilation, are not too optimizable. And then an external optimizer, such as what is in OPCache, can come to the rescue. The same goes for any other compiler: they usually compile code on their foreheads, and only after that can various optimizers be used. But the source code after the compiler should be as safe as possible (although not too fast for runtime).

the end

We saw that OPCache has finally become the officially recommended solution for caching opcodes. Dismantled his work, which is not too difficult to understand, but errors may still arise. Today, OPCache is very stable and provides a high performance increase for PHP, reducing the compilation time of scripts and optimizing the generation of opcodes. For each process of the PHP pool, shared memory is used, which allows you to access structures added by other processes. The internal string buffer is also located in shared memory, which saves even more memory in the workflow pool - usually using PHP-FPM SAPI.

Tags:

OPCache Extension Extension for PHP Overview

Introduction to OPCache

Two functions of one product

OPCache internals

Shared memory models

Opcode caching

Sharing internal string storage

Locking mechanism

OPCache memory consumption

Configuring OPCache

OPCache Compiler Optimization

Introduction

Known constant expressions and branch removal

Preliminary Estimation of Constant Functions

Transtyping

Optimized opcode substitution

Constant substitution and precalculation

Multiple jump target resolution

Conclusion

the end

Also popular now: