A detailed answer to the comment, as well as a little about the life of providers in the Russian Federation

Move me to this post this one here is a comment .

I bring him here:
kaleman today at 18:53

Today I was pleased with the provider. Along with updating the site blocking system, he got a mail.ru mail mailer under the ban. In the morning I pull technical support, they can’t do anything. The provider is small, and apparently upstream providers are blocking it. I also noticed a slowdown in the opening of all sites, maybe some kind of curve DLP hung? Previously, there were no problems with access. The destruction of Runet goes right before my eyes ...
The fact is that, it seems, we are the very provider :(

And indeed, kaleman almost guessed the cause of the problems with mail.ru (although we refused to believe such a long time). The

rest will be divided into two parts:

  1. the causes of our problems today with mail.ru and an exciting quest to find them
  2. the existence of ISP in today's realities, the stability of sovereign runet.

Problems with the availability of mail.ru


Oh, that's a pretty long story.

The fact is that in order to implement the requirements of the state (in more detail in the second part), we purchased, configured, installed some equipment - both for filtering prohibited resources and for carrying out NAT broadcasts of subscribers.

Some time ago, we finally rebuilt the core of the network so that all subscriber traffic passed through this equipment in the right direction.

A few days ago we turned on filtering of forbidden content on it (at the same time leaving the old system to work) - everything seemed to go well.

Further - gradually began to include for different parts of subscribers NAT on this equipment. In appearance - everything, too, seems to have gone well.

But today, having turned on NAT equipment for the next part of subscribers, in the morning we came across a decent number of complaints about the unavailability or partial availability of mail.ru and other Mail Ru Group resources.

They began to check: something somewhere sometimes , occasionally sends TCP RST in response to requests exclusively to mail.ru networks. Moreover, it sends an incorrectly generated (without ACK), obviously artificial TCP RST. It looked something like this:

image

image

image

Naturally, the first thoughts were about new equipment: a terrible DPI, there is no confidence in it, you never know what it can do - TCP RST is a fairly common thing among blocking tools.

Guess kalemanthat someone “superior” filters, we also put forward - but immediately discarded.

Firstly, we have enough sane uplinks so as not to suffer like that :)

Secondly, we are connected to several IXs in Moscow, and the traffic to mail.ru goes through them - and they don’t have any duties or or another motive to filter traffic.

The next half of the day was spent on what is usually called shamanism - along with the equipment vendor, for which they thank you, they did not give up :)

  • filtering was completely disabled
  • NAT was disconnected according to the new scheme
  • test PC was moved to a separate isolated pool
  • IP addressing changed

In the afternoon, a virtual machine was allocated that went to the network according to the scheme of an ordinary user, and representatives of the vendor were given access to it and equipment. The shamanism continued :)

In the end, the vendor representative confidently stated that the piece of iron had absolutely nothing to do with it: rst`s come from somewhere higher.

Note
At this point, someone may say: but was it much easier to remove the dump not from the test PC, but from the trunk above DPI?

No, unfortunately, removing a dump (and even just pacifying) 40 + gbps is not at all trivial.


After that, in the evening, there was nothing else to do but to return to the assumption of a strange filtering somewhere above.

I looked through what IX traffic is now going to the networks of the IWGs and just paid off bgp sessions to it. And - lo and behold! - everything immediately returned to normal :(

On the one hand, it’s very unfortunate that the whole day was spent searching for the problem, although it was solved in five minutes.

On the other hand:

- in my memory this is an unprecedented thing. As I wrote above - IX it really doesn’t make any sense to filter transit traffic, they usually have hundreds of gigabits / terabits per second, I just couldn’t seriously imagine such a thing.

- an incredibly fortunate combination of circumstances: a new complex hardware, which does not have much trust and from which it is not clear what to expect - sharpened just by blocking resources, including TCP RSTs.

At the moment, the NOC of this internet exchange is looking for a problem. According to them (and I believe them) they have no specially developed filtration system. But, thanks to heaven, the further quest is no longer our problem :)

It was a small attempt to justify ourselves, please understand and forgive :)

PS: I intentionally do not call either the manufacturer of DPI / NAT, or IX (I, in fact, do not even have them special claims, the main thing is to understand what it was)

Today (as well as yesterday and the day before) reality from the point of view of the Internet provider


I spent the last weeks, significantly rebuilding the core of the network, performing a bunch of manipulations "gain", with the risk of significantly affecting live user traffic. Given the goals, results and consequences of all this - morally, all this is quite difficult. Especially - once again listening to the magnanimous speeches about protecting the stability of Runet, sovereignty, etc. etc.

In this section I will try to tell the “evolution” of the core network of a typical Internet service provider over the past ten years.

Ten years ago.

In those blessed times, the core of the provider network could be simple and reliable, like a traffic jam:

image

In this very, very simplified picture, there are no highways, rings, ip / mpls routing.

Its essence is that user traffic ultimately came to kernel-level switching - from where it went to BNG , from where, as a rule, back to kernel switching, and then “to exit” - through one or more border gateways to the Internet.

Such a scheme is very, very easily reserved both on L3 (dynamic routing) and on L2 (MPLS).

You can put N + 1 anything: access servers, switches, boarders - and somehow reserve them for automatic failover.

After a few years, it became clear to everyone in Russia that it was impossible to live on like that: it was urgently necessary to protect children from the harmful influence of the network.

There was an urgent need to find ways to filter user traffic.

There are different approaches.

In a not so good case, something is put “in the cut”: between user traffic and the Internet. The traffic passing through this “something” is analyzed and, for example, a fake redirect packet is sent to the subscriber’s side.

In a slightly better case - if the volume of traffic allows - you can make a small feint with your ears: send only outgoing traffic from users to the addresses that need to be filtered (for this you can either take the IP addresses specified there from the registry, or additionally resolve existing ones domains in the registry).

In due time for these purposes I wrote a simple mini-dpi- although even the language does not turn to call it that. It is very simple and not very productive - however, it allowed us, and dozens (if not hundreds) of other providers, not to immediately spend millions on industrial DPI-systems, but given several additional years of time.

By the way, about the then and current DPI
By the way, many who purchased the DPI systems available at that time in the market already threw them out. Well, they are not imprisoned for something like this: hundreds of thousands of addresses, tens of thousands of URLs.

And at the same time, domestic manufacturers rose very strongly under this market. I'm not talking about the hardware component - everything is clear to everyone, but the software - the main thing that is in the DPI - maybe today if not the most advanced in the world, then certainly a) is developing by leaps and bounds, and b) at the price of a boxed one - just not comparable with foreign competitors.

I would like to be proud, but a little sad =)

Now everything looked like this:

image

After a couple of years , everyone already had auditors; resources in the registry became more and more. For some old equipment (for example, cisco 7600), the scheme with "side filtering" simply became inapplicable: the number of routes on 76 platforms is limited to something about nine hundred thousand, while the number of IPv4 routes today is already approaching 800 thousand. And if also ipv6 ... And also ... how much is there? 900,000 individual addresses in the RCN bath? =)

Someone switched to a scheme with mirroring all the main traffic to the filtering server, which should analyze the entire flow and send something bad to both sides (sender and receiver) RST.

However, the more traffic, the less applicable such a scheme. At the slightest delay in processing, the mirrored traffic will simply fly unnoticed, and the provider will receive a fine report.

More and more providers are forced to put DPI-systems of varying degrees of reliability in the context of highways.

A year or two ago, according to rumors, almost all FSBs began to demand the actual installation of SORM equipment (previously, most of the providers had to coordinate with the SORM plan authorities - a plan of operational measures in case of need to find something somewhere)

Besides money (not so that just absolutely sky-high, but still - millions), SORM for many demanded the next manipulations with the network.

  • SORM needs to see the "gray" addresses of users, before nat broadcasting
  • SORM has a limited number of network interfaces

Therefore, in particular, we had to coolly rebuild a piece of the kernel - just in order to collect user traffic to access servers somewhere in one place. In order to mirror it with several links in SORM.

That is, very simplified, it was (on the left) vs became (on the right):

image

Now , most providers require the implementation of SORM-3 as well - which includes, but is not limited to, logging of NAT broadcasts.

For these purposes, we had to add separate equipment for NAT in the circuit above (just the one that is discussed in the first part). And to add in a certain order: since SORM should “see” the traffic before address translation - traffic should go exactly as follows: users -> switching, kernel -> access servers -> SORM -> NAT -> switching, core -> Internet. To do this, we had to literally “deploy” traffic flows in the other direction, which was also quite difficult.

Total: over a dozen years, the core provider core circuit has become more complex at times, and additional points of failure (both in the form of equipment and in the form of single switching lines) have significantly increased. Actually, the requirement to “see everything” in itself implies the reduction of this “everything” to one point.

It seems to me that this can be quite transparently extrapolated to current initiatives on the sovereignty of Runet, its protection, stabilization and improvement :)

And Yarovaya is still ahead.

Also popular now: