Linux Wi-Fi Faster

    - It’s better to have a small one, but Feuerbach’s ...
    Victor Pelevin “Generation Pi”

    The recent release of the Linux 4.9 kernel is a great reason to talk about the upcoming WiFi overclock. I’ll make a reservation right away - the post is not about how to increase the coverage area or change regulatory domains . You don’t need to do anything like this, just update the kernel after the Dave Täht buffer fighter patches are in a stable branch.



    Significant increase in speed achieved by reducing delay [1] and excess buffering[2] on the net. The developers had to shovel for thismac80211, remove something from above, add from below, and after that the network delays were reduced by an order of magnitude. Question price? Patch in 200 lines. Details under the cut.


    The same Bufferbloat


    Bufferbloat is excessive buffering in the network equipment of the provider, which leads to unwanted data transmission delays. With a sufficiently loaded channel, each connection eats away milliseconds, which then turn into seconds, and sometimes minutes of waiting. If the network delay is 1 second, then slashdot.org will load as much as 4 minutes!


    # flent -l 300 -H server –streams=12 tcp_ndown &
    # wget -E -H -k -K -p https://www.slashdot.org
    ...
    FINISHED --2016-10-22 13:43:35--
    Total wall clock time: 232.8s
    Downloaded: 62 files, 1.8M in 0.9s (77KB/s)

    The first team uses the Python wrapper for netperf, it is a powerful tool for conducting control measurements[3] network connections.


    -l 300 #тест длится 5 минут
    -H server #подключиться к хосту server
    -streams=12 tcp_ndown #12 потоков tcp download

    Flentloads the channel so that the connection is established with a second delay. The connection setup took 99.6% of the execution time , as a result, the real speed dropped to a miserable 77 KB / s. With zero delay, the same page loads in 8 seconds. So the round trip time[4] and latency are more important than bandwidth.


    On the provider’s side, information security is an epidemic , but it is also enough on user equipment. For quite some time, each network driver was designed to meet the unrealistically high data buffering needs, as the developers optimized the packet scheduler for the highest speeds. However, IRLs are rarely used during WiFi connections. That's why cats are loading slowly, and video calls turn into torture. Check your IB without SMS and registration.


    The trouble is that the main bufferbloat on the provider side, correcting the situation there, you get an increase in connection speed for free. Speedtest ISP Xfinity and Google Fiber .


    Not to say that the matter was limited only to nagging. Starting with Linux 3.3, a whole series of fixes and optimizations aimed at eliminating information security has been released.


    • Linux 3.3: Byte Queue Limits
    • Linux 3.4 RED bug fixes & IW10 added & SFQRED
    • Linux 3.5 Fair / Flow Queuing packet scheduling (fq_codel, codel)
    • Linux 3.7 TCP small queues (TSQ)
    • Linux 3.12 TSO / GSO improvements
    • Linux 3.13 Host FQ + Pacing (sch_fq)
    • Linux 3.15 Change to microseconds from milliseconds throughout networking kernel
    • Linux 3.17 Network Batching API
    • Linux 4.9 BBR (Bottleneck Bandwidth and RTT)

    The latest in this series of fixes is the BBR algorithm. News from opennet.ru .


    The kernel includes an implementation of the congestion control (TCP) congestion control (BBR) (Bottleneck Bandwidth and RTT) proposed by Google, which has been successfully used to increase throughput and reduce data transmission delays for traffic from google.com and YouTube. BBR requires changes only on the sender side, the network infrastructure and host software remain unchanged. Instead of using packet loss as an indicator of congestion, BBR employs communication channel modeling techniques that predict available bandwidth through sequential checks and an estimate of transmit-receive (RTT) time, but not leading to packet loss or transmission delays. At the initial stage of the connection, the BBR estimates the channel bandwidth ceiling,


    These changes affected almost all network protocols, but bypassed WiFi and LTE. It could not last long and took seriously WiFi. The Make WiFi Fast project brought together hundreds of participants led by a team of nuclear networkers.


    Terminology


    • QDisc or Queuing Discipline is a regular FIFO scheduler, it is located between the IP stack and the driver.




    • The scheduler is fq_codelnot so simple. About it already wrote on Habré , therefore I will not repeat.

    fq_codel- One of the most efficient and modern algorithms using AQM .

    • TXOP - transmit opportunity, attempt to send.

    How to overclock WiFi with patches


    Dave Täht, which has already saved the Internet for the last six years, attacked the problem with the help of new and better benchmarks, which he himself had to develop. Quite popular in the scientific community and beyond Iperf3, it turned out to be unsuitable in general , since by default it implies unrealistic 100 ms information security.


    while( testing) 
        sleep 100ms
        while( total_bytes_sent / total_elapsed_time < target_rate)
           transmit buffer of data

    That was before the patch. Note the huge delays at> 10 on the upper and lower levels of the WiFi stack.





    • QDisc removed completely. The queue is now formed at the stations and advances in a circular cycle, aka Round Robin Fair Queuing.
    • Buffering has moved to the level of the intermediate scheduler MAC80211, which is controlled from the side fq_codel. it has a minimum size of no more than 2 TXOP.
    • Minimum buffering in the driver, the largest 2 TXOP pools (1.2-10ms): 1 ready-made aggregated frame for retrying and 1 more on pickup.


    The MAC80211 no longer stores packets at the lower level of the driver, but sends them to the intermediate scheduler , reports this to the driver, and the driver picks them up as they arrive. Thanks to this, the MAC80211 has more information about when data transfer occurs. Buffer delays due to this were only 2-12 ms.


    What was achieved


    The IS was managed to be overloaded so much that the delays decreased from peak values ​​of 1-2 seconds to 40 msec. The most obvious illustration will be the picture on which WiFi sessions are visible on 100 workstations before and after the patch.


    Before the patch, only 5 stations started successfully. Monstrous> 15 seconds of brake. Clickable.





    After the patch, all stations successfully started. Delays acceptable 150-300 msec. Clickable.



    Now a fly in the ointment. While only ath9k drivers fully support all these innovations, it ath10kis almost ready. The rest will have to wait, but I'm sure the rest of the drivers will also be actively developed after the patches get into the stable branch.


    Used materials and useful links





    1. Latency
    2. Bufferbloat
    3. Benchmark
    4. Round Trip Time

    Also popular now: