Seven "NOT" IT Infrastructure Monitoring

    Throughout my work, I occasionally observed situations when the implementation of monitoring in the company did not bring the expected results. Monitoring worked poorly or did not work at all. Analyzing such situations, I understood that their reasons were almost always the same. Although they all lie on the surface, I met them all the time and therefore decided to put them together so that you would be warned and armed.

    1. DO NOT implement a monitoring tool.

    If IT departments receive only a monitoring tool, effective and popular monitoring will not arise by itself. Instead of implementing a monitoring system, try to approach the creation of monitoring IT infrastructure as a process organization .

    What do I mean by the monitoring process? The monitoring process is a set of human resources, technical facilities and organizational measures aimed at solving the tasks that the company sets for monitoring during the operation of the IT infrastructure. I especially want to note that the people and rules in this definition are important not less, but rather even more than technical means.

    Here are a few typical examples of the names of monitoring projects in which I once participated. The name of the project in most cases accurately reflects the result that the customer wanted to get:

    • Introduction of a management system and operation of a corporate telecommunication network
    • Creating a system for monitoring IT infrastructure
    • Development of the management and operation of the Internet circuit
    • Introduction of monitoring system of switching equipment of OJSC
    • Creation of information technology management system
    • Creation of a software and hardware complex of an IT infrastructure management and monitoring system

    Companies concentrate on technical means and most often sacrifice the other two components - human resources and organizational measures. As a result, the tool appears, and a clear understanding of who and how this tool should use is not.

    Naturally, in any project for the implementation of the monitoring system, to a varying degree, both the role model and the accompanying documentation are present. But as a rule, this documentation is formal and does not help IT departments to answer questions that arise when working with the system.

    2. The integrator will NOT do all the work for you.

    Typically, for projects related to monitoring, large and medium-sized companies attract specialists from integrator companies. As part of the infrastructure survey, integrators rely on their expertise and problem solving. But far from being the fact that this is exactly what the company needs. No one knows the intricacies of the problems associated with the operation of the IT infrastructure better than the company's specialists.

    Therefore, I recommend to independently identify problems that a company wants to solve with the help of monitoring before engaging a third-party contractor . For example:

    • the uneven distribution of the load on the virtual infrastructure;
    • high number of accidents in the IT infrastructure;
    • high degree of loading of highly qualified specialists in performing simple tasks;
    • low availability of corporate services;
    • a large number of calls to the first line;
    • a long time from the time of the accident to its detection;
    • the need to optimize the work of system administrators;
    • low IT performance;
    • lack of reliable data on IT infrastructure resources;
    • lack of accident prevention tools.

    Also, it will be extremely useful at the first stages of monitoring organization to fix metrics that will evaluate problems in quantitative terms, and collect statistics on these metrics. As a result, we will receive information on the state of functioning of the IT infrastructure prior to the organization of the monitoring process. And after the introduction of monitoring, we will be able to control how the organization influenced the change in these indicators. Examples of such metrics can be:

    • the average number of incidents recorded during the reporting period;
    • average idle time for key services;
    • average% availability of IT infrastructure;
    • average% utilization of infrastructure;
    • the number of calls to the first line during the reporting period;
    • the average time from the moment the incident occurred to its detection;

    The better the metrics that characterize the main problems of operating the IT infrastructure are developed, the greater their improvement will be achieved during the monitoring. Continuously calculating these metrics should be an integral part of the process. With a certain frequency it is necessary to revise the formulas for their calculation. So you can respond to changes in the development of IT infrastructure, its qualitative and quantitative composition. It is advisable to use these metrics as KPI when evaluating the performance of the IT infrastructure support departments.

    3. DO NOT confuse monitoring and administration of IT infrastructure.

    Qualified specialists are one of the three key components of an effective monitoring process. Sometimes in order to save money, especially in the implementation process, companies try to entrust the maintenance and development of the monitoring system to system administrators involved in the operation of the IT infrastructure. But if you select a separate structure (specialist) to support monitoring, this will significantly improve the quality of service. Monitoring for these employees will be the main , not a side activity, so they will be much more interested in its success and relevance.

    The duties of the monitoring unit (specialist), among others, should include the following functions:

    • administration of a complex of monitoring systems;
    • creating new monitoring metrics;
    • adjustment of monitoring thresholds;
    • development of new monitoring tools;
    • execution of user requests;
    • reporting;
    • development of the IT monitoring process as a whole.

    Even if you have several IT departments jointly involved in configuring monitoring, you still need to have a separate structure - the center of expertise for everything related to monitoring . This will help prevent duplicate work and quickly resolve any conflict situations that will arise with joint administration sooner or later.

    4. DO NOT expect your subordinates to use monitoring if you do not do it yourself.

    It happens that the manager who started the implementation of monitoring in the company, gives the necessary orders, allocates funds, enters into an agreement with the integrator and ends the work with monitoring. And in vain, because for monitoring to be effective, it must be in one form or another claimed at all levels of the corporate hierarchy. As soon as the supervisor starts using the monitoring service, it will automatically become mandatory for all his subordinates.

    5. DO NOT force employees to work with the monitoring system.

    Sooner or later, the implementation of the monitoring system is completed, and its operation begins. Sometimes this is accompanied by an order: all IT departments start working with the system. As a rule, direct coercion does not bring positive results. The maximum that can be achieved is the formal execution in the minimum necessary volume.

    Monitoring will be positively perceived in the event that he will help each unit to solve their problems. If IT operations units do not begin to use monitoring, this may indicate that the objectives of monitoring implementation were set incorrectly. Or that the goals of implementation do not coincide with the goals of IT departments.

    Motivate divisions of the company to use the results of monitoring, and do not oblige to do it.A good option for such motivation will be the creation for each unit of key indicators based on metrics that describe the problems of operating the IT infrastructure in quantitative terms. I gave their examples above.

    6. DO NOT concentrate on checking the functionality of the monitoring system during its testing.

    After the development of the monitoring system, its testing takes place and then the trial operation. Here our next “not” becomes relevant. In varying degrees, I encountered this problem in every project.

    If the implementation of the monitoring tool is performed by a third-party organization, it is important that the company's specialists actively participate in shaping the system testing methods at the stages of acceptance testing and trial operation. During the tests, it is necessary to concentrate precisely on whether the tool really helps to achieve the goals and objectives set before the monitoring .

    Let us give examples of using metrics for final tests:

    • Optimization of IT infrastructure utilization. On the basis of monitoring system reports, it is possible to make unambiguous and motivated decisions regarding optimal utilization and more rational distribution of IT infrastructure.
    • Reducing the number of accidents in the IT infrastructure. IT infrastructure monitoring should give as many correct signals as possible and as few false signals as possible. You can check this by collecting statistics on the percentage of signals from the monitoring system, which actually report on the emergency state of the IT infrastructure components and led to a response to eliminate the causes of the accident.
    • Reducing the load of highly skilled professionals performing simple tasks. Checking the completeness and detail of the role model built in the system, as well as the completeness of filling it with information about the company's structural divisions, analyzing the rules for the escalation of alerts in case the system provides for them. It will also be useful to determine the percentage of signals that reach the target recipients, and compare this value with the targets.
    • Improving the availability of corporate services . Comparison of indicators of accessibility of corporate services defined by the monitoring system with actual indicators of availability for the reporting period, which are determined by alternative methods. This also includes checking the completeness and detail of the list of metrics used in determining the availability of corporate services, the threshold values ​​of these metrics and setting alerts for the target support groups of corporate services.
    • Check the quality of IT services provided by an external contractor . Verifying that the monitoring metrics of the services provided maximally cover the parameters from the SLA signed with an external contractor. Based on their data, it is possible to speak unequivocally about the implementation of the SLA conditions by the contractor.
    • IT infrastructure inventory. Checking the completeness of the inventory information collected by the monitoring system, and its compliance with the requirements and objectives of the inventory; quality check and ease of use of inventory reports issued by the monitoring system.
    • Proactive accident prevention. Comparison of statistics on the number of accidents for the reporting period before the start of using the monitoring system and after putting it into trial operation; comparison of these values ​​with targets.

    On the one hand, it’s quite difficult to verify this — you must first determine how to calculate these metrics, and then accumulate statistics on them. But on the other hand, these metrics can be used not only in the process of monitoring tests, but also to be laid in the future in the system of motivation of IT departments involved in operation.

    Testing the basic functionality — for example, the appearance of an alarm in the monitoring system in the event of a power outage at the switch — does not in itself give any indication that the system will cope with the tasks assigned to it. Such a check will only show that the system works in principle.

    7. Monitoring will NOT begin to bring benefits until you start working with it and adapt it to your needs.

    This “not” refers to the stage of operation of the system after the implementation is completed. It is extremely important to understand here that without a properly built monitoring process and its updating, the data in the system will start to become obsolete immediately after the completion of its implementation.

    By the time of commissioning, all organizational issues related to the maintenance of the system and the functioning of the monitoring, the rules for its use, the division of responsibility and support should be resolved as far as possible. Also, the rules and procedure for resolving problems that will arise during operation should be defined. The absence of these rules is the main reason why monitoring stops working and begins to degrade after the completion of the active phase of the integrator.

    Finally, by the beginning of the commercial operation of the monitoring system, regulations should be prepared in which the basic rules for working within the monitoring process will be formulated:

    • who and how will work with the system;
    • who is responsible for keeping the system up to date;
    • who has the right to adjust the threshold metrics;
    • how new metrics are created;
    • in which cases new metrics should be created;
    • How long does it take to create new metrics?
    • what should happen if the monitoring system recorded an accident;
    • who and how should react to this accident;
    • who is responsible for the operation of the monitoring system;
    • how conflicts related to the appearance of false signals or the absence of correct signals will be resolved.


    I really hope that, after reading my article, you will not find situations similar to what is happening in your company. If monitoring is just beginning to develop in your company, information about these seven major, in my opinion, mistakes in the process of implementing monitoring will help you create an effective process that will bring stability to your IT infrastructure.

    Also popular now: