IBM Tivoli Netcool: How to draw and animate the model of IT services? And what is visual RCA?
Now it’s difficult to determine who first came up with the idea of displaying service models in the interfaces of IT management systems. The apparent inadequacy of the list of current alarms and a network map to display the mutual influence of heterogeneous infrastructure has led manufacturers to look for other approaches to display. Since the proper operation of each object to a certain extent affects the quality of the service, it is logical to present the model in the form of a hierarchical inverted tree, at the top of which there is a symbol that represents the service (or service) itself, and below, with more detail, groups and components working on it. The objects below can be very different, network and server hardware, operating systems, databases, application programs - in short, everything from which alarm messages can come. There was a malfunction on the server, a critical accident came - the corresponding object in the service model turned red. The degree of influence of the status of a given model element on the service as a whole was adjusted in accordance with the understanding of this degree by the creator of the model. The idea appealed to customers, and first of all to heads of operating services, since their area of responsibility for covering heterogeneous technologies is, as a rule, broader than the area of concern for specialists.

Figure 1. TBSM interface in the Tivoli Integrated Portal
Over the years since the first implementation, the functions of service management systems have expanded significantly. From auxiliary images highlighted by accidents, they turned into real service management platforms, in some cases with the functions of automatically creating service models and real-time monitoring of SLA compliance. Naturally, different manufacturers have achieved different successes and recognition in the market.
The Tivoli Business Service Manager (TBSM) can serve as a good example and even an example of such rapid functional development. (see Fig. 1)
Without going into the technical description of TBSM, we list its main features:
- Based on the resources of Netcool OMNIbus as a comprehensive system for collecting and processing alarm messages (Fault Management System), TBSM can connect and display in its interface the behavior of objects of all technological domains, without exception. And we are talking about objects both hardware and software level. This method of accounting for states in the service model provides an answer to the question of what exactly is happening inside the working components; it provides the duty shift operator with accurate data on the malfunction and is the key to resolving it as soon as possible. This is technical level information.
- practically any data stored and changing in time in various DBMSs can participate in determining the state of a service model element. For example, TBSM can look at a database table where specialized business-level software constantly updates the number of transactions in the last minute. He compares this with the given threshold values and changes the status of the service object in the model. As a rule, information at this level is business in nature.
- for any object of the service model in TBSM, you can enable the SLA compliance monitoring function, and this will be monitoring compliance in real time, and not just after the fact. There are three types of SLAs: by the length of time outage, by the number of failures (or SLA violations by duration) per time window, and finally, by the total time of all service outages for the reporting period. All three types can be used together and in any combination. The symbols of the services provide visual indicators not only of the current state of the service, but also separately to comply with each type of SLA. In addition, it is possible to set the price of non-compliance with the SLA directly in rubles. Operators and managers in real time see in the TBSM interface how many minutes are left before the SLA violation; what is the performance indicator, if the problem is fixed right now; and how the penalties run after the violation. This is convenient for prioritizing and prioritizing IT crashes. Naturally, in addition to indicators, there is a historical reporting function that allows you to make a detailed “debriefing” on the fact of an SLA violation. This feature was well reflected in the very first product name; it was called SLAM or SLA Manager.
- after this, the product was called RAD or Realtime Active Dashboards, in this name the function of building personal dashboards that displayed the situation in real time was emphasized. These views can include beautiful service models, alarm output windows in the context of the selected object, summarizing diagrams, a convenient navigation tree for services with dynamic status indication and numerical values output, plotter for changing and comparing service states in time (Timewindow Analyzer) and, finally , library of historical reports. On individual canvases (Custom canvas), when building dashboards, you can use "measuring instruments" like speedometers and thermometer columns. (Fig. 2) Elements of services can also be represented as blocks with numerical values of parameters of interest. This is about the presentation aspect of using TBSM.

Figure 2 TBSM Dashboard Auxiliary Indicators
- Building a service model can be automated and bind its update to changing external data. The OMNIbus core, which is called ObjectServer, is essentially a database in which alarms are stored and processed online. The TBSM algorithms with ObjectServer and records in external databases are similar. If the emergency message or the entry in the table of the external database contains all the necessary information for the correct creation and placement of the object in the service model, then you can configure the function of automatic filling (autopopulation) of the service model. Imagine that an alarm message arrived from an object unknown in TBSM, and based on the TBSM message fields you can decide which template (type of objects) it belongs to, compose the name of the created object and determine who will be his parent in the model. Similarly, a new object may appear when you record about it in an external database. With external databases, the model can actually be synchronized. This feature is often used when linking a service model to an inventory database and CCMDB. Without this, it is impossible to keep up-to-date models of changing IT systems with a large number of elements.
It was mentioned above about the need to have a wide range of "sensors" for evaluating objects working for the production of a service. But no less important and useful is the availability of objective monitoring of the quality of the service from the point of view of the user, a kind of generator of artificial calls to the service with an assessment of its quality. At Tivoli, this is TCAM (Tivoli Composite Application Manager). The logic of sharing these two types of “sensors” is very simple. For example, TCAM registers unsatisfactory service response time or even its failure; he reports this to OMNIbus in a critical alarm format, where the object is not the device or server, but the service itself.
In TBSM, such alarms are tied directly to services at the top of the models. At the same time, “field sensors” detected malfunctions at the infrastructure facilities and also sent messages. TBSM linked these messages to the underlying elements of the model and calculated (and displayed) the distribution of influence over the topology of the model. The service model clearly demonstrates the problem situation, and most importantly - its source. Going down from the "reddened" service down the tree and following the color indication, the specialist immediately finds himself at the point of the most probable cause. It is noteworthy that TBSM itself marks objects with asterisks - the logical causes of the problem. It turns out a kind of visual analysis of the root cause.
In conclusion, we note that since TBSM belongs to the Netcool family, it fully meets the requirements for carrier-class software. It can be used in business critical OSS systems. It supports fault tolerance schemes, load balancing or scaling, external authorization and Single Sign-on, work within a single portal with the interfaces OMNIbus Web GUI and Tivoli Network Manager. Tivoli Integrated Portal serves contextual interactions between these products, which allows you to create convenient tool environments with fast transitions between monitoring contexts of network and telecommunication equipment, servers and their operating systems, storage and databases, web servers and application servers, and, finally, services as objects of monitoring.

Figure 1. TBSM interface in the Tivoli Integrated Portal
Over the years since the first implementation, the functions of service management systems have expanded significantly. From auxiliary images highlighted by accidents, they turned into real service management platforms, in some cases with the functions of automatically creating service models and real-time monitoring of SLA compliance. Naturally, different manufacturers have achieved different successes and recognition in the market.
The Tivoli Business Service Manager (TBSM) can serve as a good example and even an example of such rapid functional development. (see Fig. 1)
Without going into the technical description of TBSM, we list its main features:
- Based on the resources of Netcool OMNIbus as a comprehensive system for collecting and processing alarm messages (Fault Management System), TBSM can connect and display in its interface the behavior of objects of all technological domains, without exception. And we are talking about objects both hardware and software level. This method of accounting for states in the service model provides an answer to the question of what exactly is happening inside the working components; it provides the duty shift operator with accurate data on the malfunction and is the key to resolving it as soon as possible. This is technical level information.
- practically any data stored and changing in time in various DBMSs can participate in determining the state of a service model element. For example, TBSM can look at a database table where specialized business-level software constantly updates the number of transactions in the last minute. He compares this with the given threshold values and changes the status of the service object in the model. As a rule, information at this level is business in nature.
- for any object of the service model in TBSM, you can enable the SLA compliance monitoring function, and this will be monitoring compliance in real time, and not just after the fact. There are three types of SLAs: by the length of time outage, by the number of failures (or SLA violations by duration) per time window, and finally, by the total time of all service outages for the reporting period. All three types can be used together and in any combination. The symbols of the services provide visual indicators not only of the current state of the service, but also separately to comply with each type of SLA. In addition, it is possible to set the price of non-compliance with the SLA directly in rubles. Operators and managers in real time see in the TBSM interface how many minutes are left before the SLA violation; what is the performance indicator, if the problem is fixed right now; and how the penalties run after the violation. This is convenient for prioritizing and prioritizing IT crashes. Naturally, in addition to indicators, there is a historical reporting function that allows you to make a detailed “debriefing” on the fact of an SLA violation. This feature was well reflected in the very first product name; it was called SLAM or SLA Manager.
- after this, the product was called RAD or Realtime Active Dashboards, in this name the function of building personal dashboards that displayed the situation in real time was emphasized. These views can include beautiful service models, alarm output windows in the context of the selected object, summarizing diagrams, a convenient navigation tree for services with dynamic status indication and numerical values output, plotter for changing and comparing service states in time (Timewindow Analyzer) and, finally , library of historical reports. On individual canvases (Custom canvas), when building dashboards, you can use "measuring instruments" like speedometers and thermometer columns. (Fig. 2) Elements of services can also be represented as blocks with numerical values of parameters of interest. This is about the presentation aspect of using TBSM.

Figure 2 TBSM Dashboard Auxiliary Indicators
- Building a service model can be automated and bind its update to changing external data. The OMNIbus core, which is called ObjectServer, is essentially a database in which alarms are stored and processed online. The TBSM algorithms with ObjectServer and records in external databases are similar. If the emergency message or the entry in the table of the external database contains all the necessary information for the correct creation and placement of the object in the service model, then you can configure the function of automatic filling (autopopulation) of the service model. Imagine that an alarm message arrived from an object unknown in TBSM, and based on the TBSM message fields you can decide which template (type of objects) it belongs to, compose the name of the created object and determine who will be his parent in the model. Similarly, a new object may appear when you record about it in an external database. With external databases, the model can actually be synchronized. This feature is often used when linking a service model to an inventory database and CCMDB. Without this, it is impossible to keep up-to-date models of changing IT systems with a large number of elements.
It was mentioned above about the need to have a wide range of "sensors" for evaluating objects working for the production of a service. But no less important and useful is the availability of objective monitoring of the quality of the service from the point of view of the user, a kind of generator of artificial calls to the service with an assessment of its quality. At Tivoli, this is TCAM (Tivoli Composite Application Manager). The logic of sharing these two types of “sensors” is very simple. For example, TCAM registers unsatisfactory service response time or even its failure; he reports this to OMNIbus in a critical alarm format, where the object is not the device or server, but the service itself.
In TBSM, such alarms are tied directly to services at the top of the models. At the same time, “field sensors” detected malfunctions at the infrastructure facilities and also sent messages. TBSM linked these messages to the underlying elements of the model and calculated (and displayed) the distribution of influence over the topology of the model. The service model clearly demonstrates the problem situation, and most importantly - its source. Going down from the "reddened" service down the tree and following the color indication, the specialist immediately finds himself at the point of the most probable cause. It is noteworthy that TBSM itself marks objects with asterisks - the logical causes of the problem. It turns out a kind of visual analysis of the root cause.
In conclusion, we note that since TBSM belongs to the Netcool family, it fully meets the requirements for carrier-class software. It can be used in business critical OSS systems. It supports fault tolerance schemes, load balancing or scaling, external authorization and Single Sign-on, work within a single portal with the interfaces OMNIbus Web GUI and Tivoli Network Manager. Tivoli Integrated Portal serves contextual interactions between these products, which allows you to create convenient tool environments with fast transitions between monitoring contexts of network and telecommunication equipment, servers and their operating systems, storage and databases, web servers and application servers, and, finally, services as objects of monitoring.