How does the MTS Mobile Network Operations Center work. Answers to your questions

    Dear readers, thank you for the fact that our first post , which was prepared by the Director of the Department for the Operation of Converged Networks and Services of MTS, Andrei Seregin, has become the most widely read of the day. We are very pleased that the topic was interesting to you. Thanks also for the many questions. And thanks for the topics of future posts that you suggested.

    When we, together with Andrei Vyacheslavovich, began to prepare answers to your questions, we realized that the answers were too detailed - enough for a whole new post. That is why we decided to publish the answers in the form of a new post.

    Recall that last time we talked about our Center for Operational Management of a Mobile Network in Krasnodar , which we opened in 2012.

    So, we answer your questions:

    What does a person who has 12 monitors do?

    image

    The person who sits behind 12 monitors, of course, does not engage in on-line monitoring, since it is impossible to look at 12 monitors simultaneously. He is engaged in expert tasks and is looking for a solution simultaneously in several systems. That is, when there is a problem, it is fixed and handed over to him, the operator begins to test hypotheses, alternately going into different systems. You can, of course, put him one screen and he will switch between minimized windows, but as long as you find the right window, you will lose a lot of time. It’s much more convenient when you have 12 screens.

    First line operators who work in an umbrella monitoring system also have multiple monitors. Alarm messages from the main vendors of radio subsystems, switching, VAS platforms, etc. are displayed on one monitor. This is the monitor of the umbrella monitoring system. For work with incidents there is a second screen. Mail can be opened on the third monitor.

    How many engineers do you have in one shift?

    Shift engineers perform different functions. For example, on the first line, which monitors the radio subsystem, we have 3 people per shift. We also have an expert team that also works around the clock. There are shifts on the switching subsystem. Several shifts by type of equipment on the transport network. There are shifts of the main operational duty and so on. So there are enough engineers in each shift.

    On what issues does Roskomnadzor contact you?

    Roskomnadzor mainly addresses network issues during various emergencies, exercises, during the preparation and conduct of large events of national importance (forums, summits, etc.). Another topic of calls is the opening of operator network roaming. How it works: when the network of one operator fails, the operator has the right to contact Roskomnadzor and ask the regulatory body to ask other operators to open roaming. For this, we have worked out the appropriate procedure. Once a quarter, exercises are held. So in the event of a network drop of one of the operators, subscribers will not notice anything. They will continue to call on their network, but will actually use, for example, our network.

    image

    You know how the Far East is now raining. In order for residents of the affected regions to have as few communication problems as possible, by order of Roskomnadzor, roaming between all mobile operators has been opened. I am proud to inform that at emergency headquarters meetings our network is noted as the most stable in operation and taking on the largest number of subscribers of other operators.

    It would be very interesting to hear about network monitoring for the appearance of fake BSs and generally suspicious activity on the network. What new developments do you have in this area?

    I have not encountered “fake” BS in my practice. In the MTS network, a fake BS, in principle, cannot appear. The practical meaning of the appearance of such a BS in the network, from the point of view of providing communication services to subscribers, is not clear to me.
    Probably, potential fraudsters could find application for such BS, but there are separate units dealing with fraud.

    By the way, now there are mini-cellular networks for emergency situations. There is also a switch, a controller, a base station, plus a telescopic mast with antennas. All in three suitcases. You arrived in the emergency zone, where there is no cellular connection, deployed your network and created your SIM cards, registered a group of subscribers. These subscribers will be able to call each other. If you work a little, you can make an exit into the inner world, throwing a line of binding to public networks. But here the meaning of such a network is just clear, and the state ensures the legality of its deployment.

    Incidents are recorded on the order of 800-900 per 12-hour shift. Tell me, please, how many of them are really emergency situations, or perhaps some of them disappear as an erroneous message?

    800-900 are real incidents. The most significant part of them is connected with the disconnection of external power supply at the base station. Incidents of the first category are resolved within a period of not more than 4 hours. In general, no more than 40 working hours are allotted to the solution of even the most insignificant incident.

    What are the means of monitoring? I see a zabbix, but, probably, he is a bit dopped there to fit his needs. Why was he chosen, and not nagios \ cacti?

    Monitoring of basic equipment is carried out using an umbrella system. Although we have some equipment (mainly in transport), which is neither economically nor technically feasible to connect to the "umbrella". If it fails, then this link is easy to identify due to monitoring of the rest of the equipment. However, alternative monitoring systems are used and beneficial.

    Why is Zabbix chosen? Publicly available monitoring tools are similar in functionality. Who likes what. A specific engineer liked the zabbix, he once worked with him, he knows him. So it is rather a matter of taste.

    Data merge into one data center or into several? If in one, then you are not afraid to become "blind" if the data center with monitoring goes offline?

    We have a Mobile Network Management Center, as you know, in Krasnodar. It is there that the monitoring specialists sit. Their jobs are virtualized, i.e. physically, servers with monitoring and incident management software are not only not in Krasnodar, but also located on two geographically dispersed sites. If for some reason the operators cannot get to their center in Krasnodar, then they can sit in any other room and work via the Internet.

    For such a case, we have a special DRP (disaster recovery plan) program. Operators can move to any hotel or, in the end, stay at home, load a virtual workplace and work. We even carried out exercises: during emergencies, people travel to a neighboring region, sit in one of the training classes of our branch and work - until there is an opportunity to return to jobs in Krasnodar.

    image

    We also have a control center in Nizhny Novgorod. If the work of the center in Krasnodar is suspended, then Nizhny Novgorod will partially take over the monitoring. Despite the fact that specialists in Nizhny Novgorod are engaged in “fixation”, we trained them so that they could monitor the main elements and major accidents on the mobile network. In addition, the regions themselves can look at the main elements of the network - competence allows them to do so. So in any case, the switch, controller and base stations will be under control. We will not be blind.

    Also popular now: