Examples of calculating the "availability factor" for sets of network equipment
The theory and the main points on the method of calculating the "availability rate" were described by me earlier in this article .
In this publication, we will calculate the “availability ratio” of two sets of carrier-grade network equipment installed in each telecommunications cabinet and compare it with the calculation of “availability ratio” for a set of equipment without duplicate elements.
Why do we need to do calculations of the “availability ratio” for different cases of equipment layout?
We have data on the calculation of the "availability ratio" in the final results may be incorrect, too perfect, too high and too low. And where there is an error crept in or everything is calculated correctly, one can understand only when there is an opportunity to see all the elements of the system together, their uses and location.
An example of an “ideal” calculation of “availability ratio”.
The main components of the set №1 network equipment:
- Cisco ASR 9010 - 2 pcs .;
- Cisco ASR 9000v - 2 pcs .;
- Switchboard power supply "48V" SCHRZ-10-2K - 2 pcs.
The completeness of the Cisco ASR 9010 equipment:
The layout of the cabinet with the installed set No. 1 looks like this:
Calculation of the availability factor of the equipment set No. 1:
(*) - the initial data on the MTBF parameter are estimates provided for the manufacturer's equipment items or their equivalents.
The Cisco ASR 9000 Series Routers are designed to have a very high level of performance, minimizes outages or downtime and maximizes availability. The MTBF is calculated based on the Ground Benign condition. Can be adjusted based on the different router usage.
The final calculated data for the set number 1:
- probability of system equipment failure during the year: 0.0008023;
- MTBF equipment system (years): 1246 (10918609 hours);
- average troubleshooting time (hours): 24;
- system equipment availability ratio (%): 99.99978;
- Average downtime per year (hours): 0.019 (1.15 minutes).
What is wrong with this calculation?
To calculate availability, you need an understanding of how and where equipment is installed, what its functionality is and the possibility of hot-swapping and duplication of elements, the complexity of installation and replacement of components, without disabling the main systems of the complex.
In an ideal calculation, all the elements were duplicated (which is rarely the case in fact), it is assumed that we have spare parts at hand, and we can carry out the work live on the working equipment turned on alongside without any problems.
And if the physical layout is at odds with the logical scheme of the system, then there are already separate parts of the system can not duplicate each other.
In the “ideal” case, we have a complex of two halves, which are duplicate. But if there is no such logical duplication, then we are already moving away from the “ideal” calculation to a more correct one and we get a plausible result.
And let's be realistic, let's add 60 minutes per year to the Restart \ Shutdown procedure. Download a new chassis, configure and run into normal mode this time should be enough from the moment you press the power switch on the case. For 60 minutes of downtime, the probability of failure per year is 0.04167. This will be the lowest line in the calculations further.
An example of a “real” calculation of “availability ratio”.
Calculation of the availability factor of the equipment set number 1 without duplication:
The final calculated data for the set number 1 without duplication:
- probability of system equipment failure during the year: 0.5001666;
- MTBF equipment system (years): 1.99 (17514 hours);
- average troubleshooting time (hours): 24;
- system equipment availability ratio (%): 99.86;
- Average downtime per year (hours): 11.98 (719 minutes).
The difference between the above two examples of calculations is huge. And this moment must always be remembered and analyzed.
At best, even if we have duplicate elements in the system, we must ignore the possibility of using them as a replacement, in case these elements contain other components. That is, we see that we have two chassis and two power supply shields. These components are duplicated, but they have other elements inside that can stop functioning when the “parent” component fails.
If this is essential for the chassis, then it is less problematic for the shield, as there simple electronics is used only for testing and the current load display, even if this board fails, the shield will function as usual.
An example of a “standard” calculation of “availability ratio”.
The main components of the set of network equipment number 2:
- Cisco ASR 9006 - 2 pcs .;
- Cisco ASR 9000v - 2 pcs .;
- Switchboard power supply "48V" SCHRZ-48-5 - 2 pcs.
The completeness of the Cisco ASR 9006 equipment:
The layout of the cabinet with the installed set No. 2 looks like this:
Calculation of the availability factor of the equipment set No. 2 taking into account the chassis and power supply panels are not duplicated:
The final calculated data for the set No. 2:
- probability of system equipment failure during the year: 0.2167769;
- MTBF equipment system (years): 4.7 (40410 hours);
- average troubleshooting time (hours): 24;
- system equipment availability ratio (%): 99.94;
- Average downtime per year (hours): 5.2 (311 minutes).
It turns out that when calculating the availability factor, it is necessary to understand which largest element in the system can be replaced even within 24 hours. And how much the replacement of this element will affect the functioning of the other components.
For example, when replacing the chassis, we will have the entire set of boards and adapters removed from this chassis, and this may take more than 2-3 hours. And to dismantle the elements when equipment is turned on in the rack is a big risk for an additional emergency situation.
For the ideal option - two cabinets with equipment, each with 2 chassis - one working, the second empty for quick activation with the transfer of elements from the failed. But this is too ideal a situation.