Channel Balancing - Two Providers, AS, BGP, NAT

Thanks to Habr, I found a lot of useful things for myself. I think it's time to "repay debts."
I want to describe an algorithm that works for more than a year on my gateway for balancing channels (Gbit traffic, 8k clients, 2 providers, AS for 1k addresses, most clients are for NAT). Perhaps someone will come in handy. In any case, I did not see anything like it, and when I specifically looked for it, I did not find it. So completely my brainchild.
Everything that came across the Internet, allowed to reserve one of the channels. And outgoing to regulate - there are many descriptions. But to regulate the incoming traffic (i.e. to ensure uniform loading of several channels) - did not come across.
Of course, the indicated algorithm cannot be considered universal; it is suitable only under suitable conditions.

So, the source :
- Gateway on Linux (Debian 6). The quagga package (formerly zebra) is used.
- Two providers (let them be TTK and RTK). Each gives a channel of a certain thickness, "excess" cuts.
- AS on 1k addresses (let it be 1.1.144.0/22). AS0000.
- Most of the clients have gray addresses (let it be 192.168.0.0/16), the “client” networks 192.168.1-99.0 / 24, they tangle on the gateway.
- A small part of clients have white addresses in the space of my AS.

Objective :
To ensure uniform loading of the TTK and RTK channels with incoming traffic to avoid channel congestion.


Simplification.
I will not talk about shaper settings here. We assume that this is already configured and working. At the same time, the common channel is shaped, excluding the TTK / RTK.
We will not balance outgoing. In most cases, this is not relevant (the outgoing is much smaller), and it is solved quite simply.

Theory .
1. BGP allows you to control probability (preference). Those. Specify a preferred inbound route for a specific network. An artificial “lengthening” of the route is used for this - when announcing your route to a neighbor, you can repeat your AS number several times. Moreover, each neighbor can “lengthen” the route in different ways. A shorter route is preferred.
2. BGP allows you to make descriptions of the individual parts of the AS. Those. in RIPE, our AS is described as 1.1.144.0/22, no one bothers to additionally describe (i.e. announce) in BGP 1.1.144.0/24, 1.1.145.0/24, 1.1.146.0/24 and 1.1.147.0/24.
Tip - do not remove the announcement of the entire AS (1.1.144.0/22). Some gateways do not accept routes with a mask of 24. It is better to lengthen the general route a little.
3. Let me remind you the route selection algorithm for BGP routing from several available ones.
- A route with a larger mask is selected. If not selected, then continue.
- A shorter route is chosen (fewer intermediate ASs). If not selected, then continue.
- The route announced earlier is selected (considered more reliable). If not selected, then continue.
- Unambiguous pseudo-random selection.

A drop of tar.
Unfortunately, “preference management” does not mean “probability management” at all. In fact, it turns out that almost all traffic starts to go along the preferred route. Those. using route extensions,smoothly regulate flows will not work. It is more of a "switch" than a "regulator."

Idea .
Most of our customers have gray IP. Respectively, on our gateway natitsya. And this is the main traffic volume. Well, how to thread them (i.e. which external IP to set) is in our power.
Our entire AS can be divided into 4 parts (1.1.144.0/24, 1.1.145.0/24, 1.1.146.0/24 and 1.1.147.0/24) and use them in different ways. For example, the first two are for clients with white IPs, the third is the PTK preference and the fourth is the TTK preference. That's exactly what I did.
At the iptables level, decide which addresses to use for NAT.
If the client address is 192.168.1- N .0 / 24, then for NAT use 1.1.146.0/24. Otherwise, 1.1.147.0/24.
Thus, by changing N, you can smoothly balance the incoming traffic of two channels.

Implementation .
Please consider this implementation simply as an example. Not everything is optimal here, for some it is more convenient to do on the pearl / python, for some it is more convenient to organize as a single demon. The main goal of this example is to show the possibility of implementing the idea. Well, its performance.

1. To check whether the IP client belongs to 192.168.1- N .0 / 24, in iptables, use the ipset module, in the rules below the “rtk” set.
The chain "NAT_AS", which I have:
: NAT_AS - [0: 0]
# For old connections, so as not to “drop” them from one external to another
-A NAT_AS -m state -s 192.168.0.0/16 --state ESTABLISHED, RELATED -j SNAT --to- source 91.235.146.0-91.235.147.255 --persistent
# For new connections, select how to thread.
# RTK
-A NAT_AS -m state -m set --set rtk src --state NEW -j SNAT --to-source 1.1.146.0-1.1.146.255 --persistent
# TTK
-A NAT_AS -m state -m set! --set rtk src --state NEW -j SNAT --to-source 1.1.147.0-1.1.147.255 --persistent

Note that -j SNAT is used with the --persistent option. This is so that the client uses a permanent external IP. Without this, the client may have problems on many services on the Internet.
Well, somewhere in NAT / POSTROUTING
# eth1, eth3 - interfaces that “look” at the TTK and RTK
-A POSTROUTING -s 192.168.0.0/16 -o eth1 -j NAT_AS
-A POSTROUTING -s 192.168.0.0/16 -o eth3 -j NAT_AS


2. Prepare the ipset “rtk” set. The
file in which I store all the parameters (used in several scripts)
cat param_rtk_set:
# Client subnets that we will manage
export rtk_start = 1
export rtk_min = 1
export rtk_max = 99

# Maximum traffic (what the provider gave, more precisely, that you can get from it without loss). Steals up.
# RTK
export shp_rtk_max = 547
# TTK
export shp_ttk_max = 535

# Relative control accuracy
export scale = 50

# file where the current value N is stored. Better somewhere on tmpfs.
export f_set_end = / lib / init / rw / rtk_set_end

Directly creating and updating the rtk suite. You can (and should) run it regularly.
cat create_rtk_set
#! / bin / sh

# If it does not exist, then the set rtk
/ usr / sbin / ipset -N rtk nethash -q
# Temporary set is created. In order not to change the working set
/ usr / sbin / ipset -N temp_rtk nethash -q
/ usr / sbin / ipset -F temp_rtk -q

. ./param_rtk_set

# In some versions of sh, a variable must be declared before a loop or condition. To use later.
rtk_set_end = 0

# If this is the first run (there is no old value N, then create an average of min and max. And save.
if [-f $ f_set_end]; then
read rtk_set_end <$ f_set_end
else
rtk_set_end = $ (($ rtk_min + $ rtk_max ))
rtk_set_end = $ (($ rtk_set_end / 2))
echo $ rtk_set_end> $ f_set_end
fi

# fill in the temporary set
net = $ rtk_start
while [$ net -lt $ rtk_set_end]; do
/ usr / sbin / ipset -A temp_rtk 192.168. $ {net} .0 / 24 -q
net = $ (($ net + 1))
done

# copy the temporary set to the working
/ usr / sbin / ipset -W temp_rtk rtk
# delete the temporary set
/ usr / sbin / ipset -X temp_rtk -q


OK, there is a “control action”. Those. changing N (my value is stored in / lib / init / rw / rtk_set_end), you can smoothly change the ratio of incoming traffic of the TTK and the RTK. Now it remains to configure the automation.

Automation .
cat rtk-ttk:
# Get the current values ​​of the counters on the interfaces
ttk = $ (/ sbin / ifconfig eth1 | grep -Eo “RX bytes: [0-9] *” | grep -Eo "[0-9] *")
if ["$ ttk" = ""]; then
echo "No TTK ifconfig"
exit
fi
rtk = $ (/ sbin / ifconfig eth3 | grep -Eo "RX bytes: [0-9] *" | grep -Eo "[0-9] *")
if ["$ rtk "=" "]; then
echo "No RTK ifconfig"
exit
fi

# Directory where we will store past values
work_dir = "/ lib / init / rw /"

# Find the difference, save the current value, if the current is less than the past, then exit.
read ttk_old <$ {work_dir} shp_ttk_old
read rtk_old <$ {work_dir} shp_rtk_old

echo $ ttk>


if [$ ttk -le $ ttk_old]; then
echo "TTK RX smoll"
exit
fi

if [$ rtk -le $ rtk_old]; then
echo "TTK RX smoll"
exit
fi

ttk_cur = $ (($ ttk - $ ttk_old + 1))
rtk_cur = $ (($ rtk - $ rtk_old + 1))

# read the parameters
. ./param_rtk_set

# Maximum change of N in one iteration. Steals up.
max_delta = 5

# Find the deviation.
p = $ (echo "scale = 10; $ scale * ($ shp_rtk_max / $ shp_ttk_max) / ($ rtk_cur / $ ttk_cur) + 100.5" | / usr / bin / bc)
p = $ {p %%. *}
p = $ (($ p - 100))

# Increase N
n_for = 1
while [$ scale -lt $ p]; do
#echo "$ p add"
p = $ (($ p - 1))
n_for = $ (($ n_for + 1))
if [$ max_delta -lt $ n_for]; then
break
fi
./add_rtk
done

# Decrease N
n_for = 1
while [$ p -lt $ scale]; do
#echo "$ p del"
p = $ ((1 + $ p))
n_for = $ (($ n_for + 1))
if [$ max_delta -lt $ n_for]; then
break
fi
./del_rtk
done

# apply the new value N
./create_rtk_set


It remains to make add_rtk scripts to increase N and del_rtk to decrease. These scripts should read the current N from / lib / init / rw / rtk_set_end, reduce / increase, check the entry in the interval [min - max] and save. I will not bring them, it's simple.

Configure BGP .
In order for all of the above to be able to control incoming traffic, you need to prepare BGP.

An example of my bgp.conf (naturally, the real IP and numbers are changed to the original data:
!
hostname AS0000
password ****
enable password ****
log file /var/log/quagga/bgpd.log
!
router bgp 0000
no synchronization
bgp router-id [our any external IP]
network 1.1.144.0/22
network 1.1.144.0/24
network 1.1.145.0/24
network 1.1.146.0/24
network 1.1.147.0/24
!
neighbor [PTK gateway IP] remote-as [AS RTK (only number, for example 12345)]
neighbor [PTK gateway IP] update-source [our external
PTK gateway ] neighbor [RTK gateway IP] route-map MY-OUT-RTK out
neighbor [RTK gateway IP] route-map INTER_NET in
!
neighbor [TTK gateway IP] remote-as [AS TTK (only number, for example 12345)]
neighbor [TTK gateway IP] update-source [our external TTK
gateway ] neighbor [TTK gateway IP] route-map MY-OUT-TTK out
neighbor [TTK gateway IP] route-map INTER_NET in
!
ip prefix-list upstream-out seq 10 permit 1.1.144.0/22
!
ip prefix-list up144 seq 10 permit 1.1.144.0/24
!
ip prefix-list up145 seq 10 permit 1.1.145.0/24
!
ip prefix-list up146 seq 10 permit 1.1.146.0/24
!
ip prefix-list up147 seq 10 permit 1.1.147.0/24
!
! ==========================
! --- MY-OUT-TTK
route-map MY-OUT-TTK permit 10
match ip address prefix- list up144
! set as-path prepend 0000 0000
!
route-map MY-OUT-TTK permit 20
match ip address prefix-list up145
! set as-path prepend 0000 0000
!
route-map MY-OUT-TTK permit 30
match ip address prefix-list up146
set as-path prepend 0000
!
route-map MY-OUT-TTK permit 40
! match ip address prefix-list up147
! set as-path prepend 0000 0000
!
! route-map MY-OUT-TTK deny 200
! ==========================
!
route-map MY-OUT-TTK permit 100
match ip address prefix-list upstream-out
! set as-path prepend 0000 0000 0000
!
route-map MY-OUT-TTK deny 200
!
! --- end of MY-OUT-TTK
! ==========================
! --- MY-OUT-RTK
route-map MY-OUT-RTK permit 10
match ip address prefix-list up144
set as-path prepend 0000
!
route-map MY-OUT-RTK permit 20
match ip address prefix-list up145
set as-path prepend 0000
!
route-map MY-OUT-RTK permit 30
match ip address prefix-list up146
! set as-path prepend 0000 0000 0000
!
route-map MY-OUT-RTK permit 40
match ip address prefix-list up147
set as-path prepend 0000
!
! ==========================
!
route-map MY-OUT-RTK permit 100
match ip address prefix-list upstream-out
set as-path prepend 0000 0000
!
route-map MY-OUT-RTK deny 200
!
! --- end of MY-OUT-RTK
! ==========================
! ---- Local nets
ip prefix-list local_ seq 15 permit 192.168.0.0/16
ip prefix-list local_ seq 18 permit 0.0.0.0/8
ip prefix-list local_ seq 19 permit 127.0.0.0/8
ip prefix-list local_ seq 20 permit 10.0.0.0/8
ip prefix-list local_ seq 21 permit 172.16.0.0/12
ip prefix-list local_ seq 22 permit 169.254.0.0/16
ip prefix-list local_ seq 23 permit 224.0.0.0/4
ip prefix-list local_ seq 24 permit 240.0.0.0/4
! Here you need to add your own “white” networks
!
route-map INTER_NET deny 10
match ip address prefix-list local_
!
!
route-map INTER_NET permit 200
set local-preference 500
!
line vty
!

Let me remind you that here “0000” is the number of my AS, “1.1.” Is the beginning of my IP. Everything else is signed.

Tuning, tuning .
First, you need to configure all the “set as-path prepend” in bgp.conf.
The task is to get everything outgoing 1.1.146.0/24 (this is conditional, of course), then get a big bias towards the RTK.
And vice versa, if you pull everything outgoing 1.1.147.0/24, then get a strong bias towards the TTK.

Secondly, you need to specify the maximum available traffic from the TTK and RTK in the param_rtk_set file (shp_rtk_max and shp_ttk_max). I do not recommend indicating the value “from the contract”. Indicate the maximum that you have ever received. Keep in mind that this sets the desired ratio of incoming traffic.

The third.
Indicate the desired control accuracy (“scale” value in the parameters). The larger, the smaller the “dead zone", the stronger the control action (N changes more). Too high a value can cause “beats” (resonance).

Fourth.
Specify the maximum change in N in one iteration. Too high a value can cause “beats”, i.e. resonance. This happens because the reaction to the control action does not appear immediately, but with a certain delay. Well, too small a value will not allow you to keep up with the realities of life.

That's it. It remains to do the job in cron, to execute rtk-ttk every minute. Or make some kind of daemon that rtk-ttk will run periodically.

I will add that the specified algorithm works for me for more than a year. Sometimes you have to intervene - adjust BGP settings (set as-path prepend). Something on the Internet is changing, you have to react.

I will accept any comments or advice, I will answer in the comments.

Also popular now: