Icinga in action. Large Hadron Collider Monitoring at CERN, Switzerland / France

Original author: Amanda Mailer
  • Transfer
CERN and Icinga

CERN is the European Center for Nuclear Researchers, and in addition it is also a collision of particles with a frequency of 40 MHz and 11000 revolutions per collider per minute. The CERN Large Hadron Collider is the largest and most powerful particle accelerator in the world. Icinga is a free, open source enterprise-wide monitoring system. For its part, Icinga helps the stable operation of the LHC equipment in three of the four detector sites. This equipment looks for the differences between matter and antimatter, as well as further confirmation of the existence of the Higgs boson and checks the models of modern physics, as we know it today.


CERN is one of the largest and most respected research centers in the world. He is engaged in fundamental physics, searching for the fundamental principles of the Universe and the laws of its existence. At CERN, the largest and most complex scientific tools are used to study the constituent elements of matter. Particle accelerators accelerate particle flows to high energies until they collide with each other or with stationary targets. Detectors record and record the results of these collisions. Founded in 1954, the CERN laboratory is located on the Franco-Swiss border near Geneva. It was one of the first European joint ventures in which 20 states are currently involved.

For more information on CERN activities and experimental equipment, see MgrinCERN - what is an organization for 900 million dollars .

At a depth of 100 m under the Franco-Swiss border is a 27-kilometer ring, better known as the Large Hadron Collider (LHC), which collides subatomic particles with an energy of 14 TeV. Detectors located on 4 sites, with a total mass of up to 12,000 tons, record experimental data in which attempts are made to reveal the initial causes of the existence of matter and anti-matter, the existence of the Higgs boson, additional dimensions of our space, among others, are verified. To maintain order and understanding of processes, Icinga monitors three of these sites: LHCb, CMS and ATLAS (Fig. 1):



Matter vs. Antimatter: Monitoring

The equipment of the LHCb experiment (Large Hadron Collider Beauty) is 21 meters long, 13 meters wide and 10 meters high. From it there is a data stream of 60 Gb / s, which contains information about the origin of matter and anti-matter. The control system and data collection chains form the information skeleton of an experiment running on machines running Windows and Linux, as well as on embedded processors.

Initially, monitoring was carried out by one Nagios site. However, as the CERN IT team tried to scale the solution, problems began to surface: the average delay in checking services at 328 seconds was too long. A new solution was needed and administrators turned to Icinga and its active community.

Due to configuration compatibility, migrating from Nagios was relatively straightforward. However, in order to facilitate future support for the solution, the configuration files were reorganized and groups and inheritance between hosts began to be fully used in them. Thus, adding a new monitoring object to an existing category such as a DBMS server, settlement node, storage system, etc. led to a change in only one configuration file

The LHCb experiment is currently being monitored by a single Icinga instance installed in failover mode. It works in conjunction with mod-Gearman execution processes, NRPE and NSClient ++ remote agents. In addition to the addition of SNMP checks and specialized performance measurements, several specialized checks such as GPFS and file system control have been added.

The Icinga central server is responsible for scheduling the checks that 60 distributed Mod-Gearman distributed execution processes extract from their queues, execute them, and then put the results in another queue. (fig. 2). In the new installation, one instance of Icinga's monitoring system is able to track the vast environment of more than 2,000 hosts and 40,000 services. The service check delay has decreased from 328 seconds and now is less than one second.


How to check the Higgs boson


On the second and third platform there are detectors of equipment of the CMS experiments (Compact Muon Solenoid - (Compact Muon Solenoid, KMS) and ATLAS (- An Toroidal LHC Apparatus, LHC Toroidal Apparatus), with their help physicists are trying to determine the presence of the Higgs boson, find other dimensions of space and dark matter.

In a CMS experiment, Icinga monitors the status of 3,000 hosts and 70 switches using a single central monitoring site. It runs one executable process mod-gearman, NRPE and check_multi. With their help, Icinga processes the results of 90,000 checks for every 2 minutes. The checks are very diverse - from monitoring network utilization, the presence of errors and the amount of free disk space to monitoring the status of RAID arrays, equipment temperature and other special services, so Icinga looks after the entire range of existing equipment.

The ATLAS experiment deployed two Icinga instances that run on virtual machines and work side by side with Nagios. With a total number of hosts of 3000, Icinga servers monitor 90 critical sites on both networks. Monitoring helps ATLAS maximize the use of beam time at the collider, and collect the most data possible for physicists.

Future extensions


Already, there are plans to completely migrate the ATLAS experiment monitoring system to Icinga, mod-gearman and ganglia, which will allow monitoring 3000 hosts and performing 100,000 checks at a time. They will include hardware monitoring via IPMI, and most likely will work on the same central installation of the monitoring system with the mod-gearman execution process, as well as other icinga installations.

The Icinga monitoring extension to the CMS is also underway. It is planned to create a larger number of dedicated services for monitoring the currently added software on which the experiment is based. In expanding the boundaries of Icinga monitoring, the IT CERN team can be sure that they will have the best efficiency in monitoring the LHC and the experiments will be really real science. An interesting fact is that icinga monitoring already played a role behind the scenes when the Higgs boson was discovered. And as the LHC and its equipment continue to collide particles and freely collect data, Icinga will continue to work on science and upcoming discoveries.

Also popular now: