Monitoring the availability of sites from the inside. False positive exception

    Today we’ll talk about how we solve the following tasks:

    • Fixation of falls;
    • Exception of false positives;
    • Calculation Uptime. Optimistic and pessimistic scenario.


    Fixing problems and eliminating false positives
    After the user adds a site for monitoring, the system begins to poll him at a specified interval. The interval can be from a minute to an hour.

    Checks are carried out from geographically distributed monitoring points . These are all independent servers spaced around the world. Now there are more than 20 of them.

    An agent is randomly selected from a common pool of current working agents. If during verification the point returned an error, then the process of rechecking from 5-7 independent agents is started. After rechecking, the site is considered “fallen” if most points confirm the problem. Otherwise, it is believed that there was a local problem on the agent that recorded the “initial error”.



    The same algorithm with the definition of "raising".

    The algorithm allows to reduce false positives to almost zero.

    Statistics calculation

    We judge the inaccessibility of the site, only on the basis of checks at a given interval. It is impossible to say with 100% probability that the site did between checks. However, with a high probability between the two problematic checks - the site lies. But if after the error there is a recovery, then in this interval the site can both lie and work. Based on this, we expect a pessimistic and optimistic uptime. What is at stake can be understood by looking at the figure.
    Optimistic uptime is taken into account when calculating statistics. And when notifying users, in alerts, downtime is indicated according to a pessimistic scenario.

    May Uptime be with you!

    Recall that to raise the uptime, you can use our site availability monitoring service , as well as do it onlinechecking the site’s health and speed. In addition, our service allows you to quickly find out about problems with your web site or server using SMS or Gtalk.

    Also popular now: