Watchdog is watching you (hosting monitoring)

    Attention! This article is for web-programmers - contains source codes and tech. the details.

    Watching for qualityYou had this - that a simple question plunged you into a stupor and deep thought? This happens to me every time clients or friends ask me:

    - Andrey, what hosting would you recommend for our site?


    And there is nothing to answer, because all (all!) Of our measurements are not in favor of the hosting companies, I will not even bring anyone specifically - you can check it yourself by following the recommendations in this article.

    It would seem where the dog is buried? Indeed, for us, hosting is one of our favorite calluses, which is often attacked because we are SEO optimizers. We are working hard to bring our sites and clients to the top, and poor and unstable hosting scares Yandex, Google and others like it. However, hosting often falls during the day, especially during peak loads around 18:00 due to the influx of onlookers on the Internet in the evening.

    Here is the simplest thing that happens - the site is working fine during the day, while the support of the hoster is watching. And at night, he regularly “lies” in knockdown. For example, hoster scripts make backups and everything is overloaded. Customers are sleeping, customers are sleeping, the site is sleeping. Everyone is happy, except for search spiders.

    The first thing we did was buy our expensive server and take it to the Caravan (thanks to the guys for the excellent colocation quality). But our server is not rubber, and as a hosting service we do not provide. Therefore, let all of ours and can not.

    In order to somehow control the situation - I wrote a hosting stability monitor a couple of years ago. Now that we already have many other competitive advantages, we are ready to lay out the source code and the algorithm of work for Habra-publicity, which this post is dedicated to.

    So get to the point


    The script is simple to disgrace, and the source is raw and not combed, despite the help of our habraradrug Grox 'a. Please do not kick much for it.

    So, the archive with sorts (PHP + MySQL), 11 kb zip . Everything is open, without obfuscators.

    Fast start

    1. Unpack the archive at our place, pump everything to our server, for example, to mySite.ru / watchdog /
    2. Create a database for the dog (preferably a separate one), and execute the sql file sql.txt (or upload it to PHPMyAdmin)
    3. We prescribe database accesses in the dbconnect.php file
    4. We prescribe ourselves as a user in the grox_config_users.php (username, password)
    5. We go to moisite.ru / watchdog / and add the first - reference * site
    6. We prescribe crowns to twitch times in 5 or 20 minutes, the nip.php script, for example, like this * / 10 * * * * / usr / bin / wget -q " yoursite.ru / watchdog / nip.php "
    7. Now we can add other sites to monitor
    8. A week later We grab the first cream
    9. ?????
    10. PROFIT

    Naturally, this way you can effectively monitor only other servers, but not your own.

    * The reference (calibration) site is someone else's site with obviously high-quality hosting (for example, they have many servers in different data centers). The one that we use - I will not specifically mention, otherwise I’ll accidentally ask :) It is needed in order to eliminate the error due to the brakes of our checking server itself. How it works? Read on.

    Database structure


    bones - “bones”, the result of biting sites. The most important
    bones_back log file is where without backups. Here we transfer everything that is older than three months.
    samples - bite synchronization, allowing you to compare delays with the calibration site.
    watch_dogs is a list of sites that watch dogs bite to check

    How it works?


    Each call of the krona to check the hosting has a unique number. Sites are polled randomly. This ensures that the script works under the conditions of the suspension of one separate site. If the "calibration" site fails at the same time as the site being checked, we conclude that it was not the site being checked that was buggy, but our server itself, and we reject this sample. Comparison of the samples is done by the field sample_id in the bones database.

    This is what the list of “dogs” looks like.

    Bite

    Here is an example of failures in the last 7 days. The

    crispo.ru

    statistics are good enough, morning delays are just the start of a backup.

    What can be improved / added?

    1. Don’t swear at register_globals, this script is for local access (not outside)
    2. The user interface would be better
    3. There is not enough information to put together a sample of the site and the calibration site
    4. You can add tracking for a key phrase on a page that guarantees the normal operation of the database of the site being checked
    5. Failure notifications
    6. More “correct” script for clearing tags from the semantic part of the site
    7. Buttons for general suspension of the watchdog script in cases when it is necessary to urgently reduce the load on the main server

    There are similar services, even checking mail and ftp availability, but often they are on western servers and are quite complicated. We have successfully integrated our own script into the client’s office, and the internal reports are monthly, and in general are satisfied with its simplicity and straightforwardness.

    We give the script to everyone, if you write a decent continuation, we ask you to also share it with everyone. It is possible through us (by sending the source code to support@webprojects.ru).
    If it turns out to be useful to you - remember about us, do not delete the mention of authors from signatures in comments and footers.


    PS The monitoring module is called a watch dog in honor of the remarkable property of some AMR controllers. Initially, it was an interrupt with an internal counter, which monotonically reduces the value of a special register. As soon as the value reaches zero, the controller is restarted, and the program starts from scratch. In the main and reliable part of the program, the control register is reset to high values. If suddenly the software of your washing machine or TV freezes due to some kind of error, the watchdog will reboot itself.

    Also popular now: