Stabilize PHP in battle - what and why the web server drops

    You are responsible for the stability of the web project in PHP. The load is constantly growing, features are added, customers are satisfied. One fine day mysterious errors begin to appear ...

    Server Software Errors

    ... which programmers do not know how to fix, because Server software “breaks down”, for example, a bunch of apache-PHP - and the client receives a page on routine maintenance in response to a request. The web developer often does not have deep knowledge of C programming in unix / linux, and the system administrator often, unfortunately, does not dive deeper than bash into the system. Real hardcore :-)

    Unstable work of server scripts

    Often, certain pages of a web project start to go crazy. For example, it takes 15 minutes to complete and find out what they are doing is not easy. In a previous post on this topic, I described one of the methods for determining what a PHP script on a combat server does, but it feels like a more powerful tool is needed.

    In practice, I often come across projects that encounter a similar class of server software errors, and the team does not always know what to do. Segmentation fault messages often appear in the apache log, clients get an error page, and a web developer with a system administrator rack their brains, play with different versions of PHP / apache / precompiler, collect PHP from source with different options again and again , they write about bugs, and they are proved to them that these are not PHP bugs, but their code, and so on indefinitely ...

    In this article I want to tell you how to quickly and easily find the reason why PHP crumbled on a combat server and eliminate it - without plunging into beautiful C system programming world for un ix :-) You will need a desire and one cup of coffee.

    We look in the web server error log

    If you see something similar in the apache error log, then the article is for you: In this case, it is useless to look for detailed information in the PHP error log - after all, the process crashed, not the script. If you do not make a nice page about routine maintenance on nginx in advance, then clients will see the ascetic error "50 *". I want to give someone in the face, but to whom? :-) To distract from destructive decisions, we recall the theory. What is a signal? This, you can say, is a tool that the operating system uses to tell the process that it, for example, is wrong :-) It takes and, violating the laws of mathematics, divides by ... 0, or by forced actions causes a stack overflow. In this case, we see a signal with the number 11 and the name "SIGSEGV". The list of signals can be viewed by executing “kill -l”:
    [Mon Oct 01 12:32:09 2012] [notice] child pid 27120 exit signal Segmentation fault (11)

    11) SIGSEGV

    Some signals, for example SIGSEGV, cannot be intercepted, so your apache-PHP process will be mercilessly killed by the kernel without trial. It turns out precisely to intercept him - it is possible, but you need to climb into the source :-)

    And why did they kill that?

    Now let's find the reason why the apache-PHP process was killed? To do this, configure the dump of the process memory at the time of the kill :-) or coredump. Yes, yes - until now, an obsolete 50-year term has been used, meaning the storage of data from magnetic cores . As soon as the next time the process is killed by the operating system, a file is created by the kernel - the place and its name can be configured . If you're in the console, just type “man 5 core”.

    For example, you can put files in a folder like this:
    echo "/tmp/httpd-core.%p"> / proc / sys / kernel / core_pattern

    If nothing is set, the system will create a file called "core. # Process_number #" in the working directory process.

    Just make sure the apache-PHP process has write permissions there.

    That's not all. By default, most likely, the generation of coredump files is disabled on your system. You can enable it by inserting the line at the beginning of the web server startup script:
    ulimit -с unlimited
    or, to make the setting permanent, edit the file "/etc/security/limits.conf". You can insert there: Details on the file format - "man limits.conf". However, until I set up a folder for coredump files for apache, nothing worked ("/etc/httpd/conf/httpd.conf"): Now we restart apache: We are testing. Manually kill the process: ps aux | grep httpd ... kill -11 12345 We look in "/ var / log / httpd / error_log": In "/ tmp" we now find a file with a name like "/tmp/httpd-core.22596"
    apache - core -1

    CoreDumpDirectory /tmp

    service httpd restart

    [Mon Oct 01 16:12:08 2012] [notice] child pid 22596 exit signal Segmentation fault (11), possible coredump in /tmp

    You learned how to get a memory dump of a killed process. Now we are waiting for the process to be killed naturally.

    At the crime scene - interpret coredump

    It is important to know that if PHP is built without debugging symbols (the --enable-debug, -g key for gcc during compilation), we will lose a lot of useful information. However, if you compiled PHP from source even without this option, and the sources are nearby - this may be enough for analysis.
    There is still a very common misconception that debug builds affect performance and the memory footprint consumed by the process. It does not affect, but the size of the executable file only increases. Therefore, if you cannot figure out the cause of the error without debugging builds, ask the system administrator to build a PHP module with debugging symbols.

    How to open coredump? Of course, the old and "very kind" utility - gdb , originally written by the supreme apostle of the free movementfree software by Richard Stallman .
    Understanding how the debugger works does not take much time. One of the most entertaining textbooks can be consumed in a couple of hours , or you can ask the system administrator to do it ;-)

    Usually they open coredump like this:
    gdb path_to_ the executable_file of the web server file path_to_coredump

    All self-respecting C developers on unix certainly know how to use this debugger, everyone does it, probably day, but, unfortunately, they may not be on your team. And there is one more unpleasant BUT ...

    PHP debugging in gdb - black magic

    The fact is that the PHP script compiled into the bytecode is ... not quite a C program ;-) It is necessary, though quite a bit, to
    understand the insides of the Zend engine - and you will understand pretty quickly . Namely, you need to find the last call to the execute function in the trace, go to this frame of the stack and examine the local variables (op_array), as well as look into the global variables of the Zend engine:
    (gdb) frame 3
    #3  0x080f1cc4 in execute (op_array=0x816c670) at ./zend_execute.c:1605
    (gdb) print (char *)(executor_globals.function_state_ptr->function)->common.function_name
    $14 = 0x80fa6fa "pg_result_error"
    (gdb) print (char *)executor_globals.active_op_array->function_name
    $15 = 0x816cfc4 "result_error"
    (gdb) print (char *)executor_globals.active_op_array->filename
    $16 = 0x816afbc "/home/yohgaki/php/DEV/segfault.php"

    You can get confused with op_array, so a lookup command like this structure is useful:
    (gdb) ptype op_array
    type = struct _zend_op_array {
        zend_uchar type;
        char *function_name;
        zend_class_entry *scope;
        zend_uint fn_flags;
        union _zend_function *prototype;
        zend_uint num_args;
        zend_uint required_num_args;
        zend_arg_info *arg_info;
        zend_bool pass_rest_by_reference;
        unsigned char return_reference;
        zend_uint *refcount;
        zend_op *opcodes;
        zend_uint last;
        zend_uint size;
        zend_compiled_variable *vars;
        int last_var;
        int size_var;
        zend_uint T;
        zend_brk_cont_element *brk_cont_array;
        zend_uint last_brk_cont;
        zend_uint current_brk_cont;
        zend_try_catch_element *try_catch_array;
        int last_try_catch;
        HashTable *static_variables;
        zend_op *start_op;
        int backpatch_count;
        zend_bool done_pass_two;
        zend_bool uses_this;
        char *filename;
        zend_uint line_start;
        zend_uint line_end;
        char *doc_comment;
        zend_uint doc_comment_len;
        void *reserved[4];
    } *

    The debugging process consists in walking between the frames of the stack (“frame N”), switching to each call to the “execute” function and examining its local arguments (“print name”, “ptype name”). The lower the frame number, the deeper you are. Sometimes it’s useful to visit PHP extension and see where the error occurred and why (at least try to understand the reason).

    (gdb) frame #номер#
    (gdb) print op_array.function_name
    $1 = 0x2aaab7ca0c10 "myFunction"
    (gdb) print op_array.filename
    $2 = 0x2aaab7ca0c20 "/var/www/file.php"

    And so on ...

    If you choked on coffee :-), then just remember that when switching between frames of the call stack using the "frame # N #" command, you can watch all the specific elements of this structure - and you can definitely set which PHP file was PHP function is called, what function it called , etc. - and you get to the cause of the “Segmentation Fault” or other error that killed the process. And explain to the programmers what the reason is and they will correct it! Quickly, and one must be optimistic - forever.

    Common Causes of Errors

    Start browsing coredump-files (or assign it to the system administrator) and you will quickly learn how to classify errors into groups:
    1) Problems in PHP extensions. In this case, either disable the extension, or try playing with its settings. You know for sure that the problem is in it, the matter is small.
    2) The problem with recursion, stack. You may encounter an error in which a library function, such as pcre, enters recursion and calls itself twenty thousand times. You can either configure the library parameters or, if you are too lazy, add a larger stack to the process ("/etc/init.d/httpd"):

    ulimit -s "set the value to greater"

    And the current value can be viewed with the command: "ulimit -a" (man ulimit , then look for "ulimit").
    3) Problems in the PHP core - here you need to write to the PHP developers :-)

    In general, the range of causes of the error will be seriously reduced. Which is what we need.

    Debugging a running process

    That's not all. If you can’t get coredump, you can connect to the running process and take a walk on it. While you are inside the process, its execution is paused ("ps aux | grep apache | grep 'T'" - it will be in the state of tracing). When you leave him, he will continue to be executed again. You can connect like this:
    gdb -p process_id


    In the article, we learned how to "correctly prepare" server software errors, make apache-PHP debug builds, create coredump files and interpret them correctly using a symbolic debugger. We also learned that from the coredump file you can find the specific PHP file and the function that caused the error.

    Now you can create a checklist for the manager to deal with mysterious server errors that neither web developers nor system administrators can figure out:

    1. Enable the collection of coredump files on the server (sysadmin)
    2. If necessary, rebuild apache-PHP with debugging symbols (sysadmin)
    3. Using gdb (the weekend to study it), investigate the cause of the error (system administrator with a web developer)
    4. Take measures to eliminate it or reduce the frequency of occurrence: change the settings, update software, write to the bugtracker, disable the PHP extension, etc.

    In conclusion, I invite everyone to our Bitrix24 cloud service , in which we effectively use all the technologies described in the article.

    Good luck to everyone and stable work of web projects!

    Also popular now: