- From RSS
Today, for several hours, Yandex services were unavailable. This was due to a problem with routing in the Yandex network. This is not connected with the work of data centers, DDoS attacks, fires, or any other external factors. Now the main consequences of the problem are eliminated. No user data is lost.
We apologize to all our users.
For those who are interested, a more detailed description:
The problem is caused by a software error on the router located in our new data center in Amsterdam. Yandex uses routing protocols - the internal OSPF protocol and the external BGP protocol. Due to an error, information about all external routes appeared in the internal routing tables. This is about three orders of magnitude more routes than usual. OSPF is not designed for that amount. As a result, all routers ran out of memory and stopped working. The network was broken, and after a few minutes Yandex became completely inaccessible.
The internal network did not work either. Therefore, our specialists needed to spend a lot of time in order to get to the source of the problem along the chain.
Administrators fixed the error on the router. After that, in order to remove the extra load from the rest of the routers, of which we have more than a hundred, our specialists had to divide the network into several parts. The amount of traffic decreased, the routers got more free memory, and they were able to independently restore the network connectivity.
The network began to rise gradually. After some time, Yandex services became available to most of the users.
doing sh ip bgp summary