Network speed measurements - what the creators of Bandwidth Meters are silent about

    It may be useful to someone from the system administrators / networkers. It was necessary to measure the characteristics of the downloaded channel from the provider, to understand where the problem is and, if it is really in the channel, to provide objective data for further conversations with the provider.

    Gigabit channel. Peak loads, according to the router, are about 480Mbit / 70,000 packets / s. Users complain that it "slows down" and that all kinds of speed measurements available online regularly give all sorts of terrifying results.

    I made a bunch of tests with different online Bandwidth Meters and all sorts of utilities. The first thing that caught my eye was the completely implausible scatter of the results. Not only did each tool produce its own “unique” results, but also the launch of the same tool gave radically different results for several minutes. As a result, the only conclusion that could be drawn from these measurements: they all lie, and they lie not just a little, but hundreds percent in plus or minus.

    The next step - once the available tools lie - try to quickly bungle something of your own that can send between 2 points (on both sides of the channel) all kinds of different packets and their combinations and measure the time of packet arrival as accurately as possible in order to have statistics for analysis.

    And here, it seems, and found the "root of evil" - scheduler processes in the system. In most operating systems, processes are not able to use the processor for as long as they want, because they are not alone in the system, others also need to. And therefore, the processor time goes to them in portions, all in turn (well, if you simplify it a little), and with some time intervals. And the more loaded the system - the longer these intervals.

    As I understood from the documentation for the nanosleep () function (for Linux), if the interval is less than 2 ms. and the process is started with the right privilege level - it performs a delay by means of a certain cycle within itself, without giving control to the system, because it will not be in time to get back otherwise, and if there are not enough privileges, it asks the system to "wake up on time" but really hope is not worth it, because the interval will most likely not be less, but as a rule much more than 2 ms.

    Based on this, it can be assumed that a regular user application that is not part of the kernel of the system and, as a result, does not have the ability to “stop the world” while it is busy, can measure time intervals not more accurately ~ 2 ms.
    Further, a little math: in 2 ms on a 1-Gbit link, at least about 160 packets (1 gigabit / 1,500 byte packet (12,000 bits) / 1000 milliseconds * 2) can be delivered in time, and a lot more if they are small.

    That is, from the moment a program trying to track the moment a packet arrived was interrupted by the system and until the moment when it has the opportunity to continue its work, approximately 160 packets can accumulate in the buffer, which, from the point of view of this program, appeared SIMULTANEOUSLY.

    In such a situation, measurements based on the time of arrival of the packets, as well as on the difference in the time of their passage, are mildly useless. They make some sense only at speeds of the order of 1 megabyte or less.

    At a higher speed, not being able to reliably track the packet movement times, we can measure only 2 things:
    - how quickly a relatively large block of information (file) crawls through the channel, large enough to make time measurement errors insignificant - that is, free bandwidth at a given point in time
    - as well as what percentage of this block of information disappears along the way - that is, packet loss at a given point in time.

    Neither one, I think, can be called an objective assessment of the quality of the channel as such, if the traffic at the moment there is also some traffic from other users - both of these parameters can be affected too much by the channel loading by other users.

    For now, dig further. If anyone is interested, I will gladly share the results of further searches.

    Also popular now: