Mikrotik: script to switch to a backup Internet channel
I want to share my script for switching to the backup Internet when the main one disappears, and returning to the main one as soon as it starts working again. I must say right away that the channels are available one at a time, there will be no load-balance here. Both channels are PPP connections (in my case, one is wired, the second is a 3G whistle). The script was made specifically as the most flexible monitoring tool, since other options, in particular the check-gateway, are not entirely correct for me.
The basic principle is simple: a raised VPN channel does not mean that the Internet works through it. I check by pinging a few external addresses. You can come up with when the pings are not an indicator of work, but I omit these cases, in the script you can specify any other method of verification, under the situation. Other features: a backup channel is a mobile network, and it connects only if there is no main channel, the rest of the time the interface is off. When you return back to the main channel, its operability is correctly checked. A technique different from ping with an interface. Well, the route-distance of the interfaces changes dynamically and is always not equal, which makes it possible for the channels to work simultaneously, but traffic is directed to only one of them.
Understanding, you can easily redo the script if the providers or one of them gives static.
So, I will consistently describe what setting is needed for the script to work, and then describe the main points of work in pieces. At the end there will be a whole script.
Suppose there are 2 PPP connections ISP1 - the main, and ISP2 - the backup, both are configured and work separately. We set dial-on-demand = no and add-default-route = yes on them , then set the default-route-distance parameter in ISP2 to one more than ISP1. We set up standard things like NAT, marking packets and connections for responses on the same interface from where the request came, routes for marked packets:
We also assume that the local address of the router is 192.168.xx.yy, and the subnet is 192.168.xx.0 / 24. This data, like the names of the interfaces, needs to be changed to your own. This is not the whole setup, but about everything in order.
We define the variables: we write the names of the interfaces in ifMain and ifRes , the local address of the router in pingSrcAddr (it will be clear later why it is needed), and 3 external addresses that will be pinged to check the channel in the ip array .
Allow only one copy of the script to run. Delay in case of launch at the start of RouterOS, give time to rise connections.
Skip a bit and move on to the main part. The script runs endlessly, or rather, until it is stopped or an error occurs. In an infinite loop, it analyzes the current state with the state variable and performs the necessary actions. Consider them.
State 0 - when the main channel is working. Once every 15 seconds, we check sequentially one of the three specified addresses, if there is no answer, we check all 3 addresses. Deaf - initiate the transition to the backup channel. It is strictly indicated that the addresses in the array are 3. If this is not the case, you will have to fix it.
State 1 - channel switching. It is important which particular PPP connections are used. In the example, ISP1 is l2tp-client, and ISP2 is ppp-client. If others, you need to correct them in lines with default-route-distance .
After turning on the backup channel, we wait 7 seconds. This is enough time for me, during which the 3G connection rises. During this time, the current connections and new ones hang in timeouts, while the main VPN has not yet broken, and the responses of a dest unreachable router are minimized.
Sound indication for an amateur, may work at night. If not necessary - remove.
Next, the main channel is disabled, its default-route-distanceset to 1 more than the backup, and it turns on again. Due to this, we are able to wait for the return of the main channel without interference for the Internet through the reserve.
Looking ahead, when switching back to the main channel and disconnecting the reserve, its default-route-distance will again increase by 1. With each switching of the route distance, PPP connections successively increase. To ensure that they do not go too far, the current value is checked here and a reset of 1 occurs when exceeding 10 (the figure does not matter, taken for example, theoretically a maximum of about 250).
State 2 - waiting for the restoration of the main channel. It is worth noting that the state of the reserve is not interesting. If he did not connect, there is nothing to be done, all the conditions for him are created, and in fact we are only interested in the main channel.
Here, the VPN of the main channel is expected to be raised, and after that attempts to ping external addresses occur through it with an active reserve. It is made difficult, but correct. If you write ping xx.xx.xx.xx interface = $ ifMain, then according to the developers, this can both work and not. Here ping is used from the local address of the router. It is assumed that it is always there, otherwise why a router is needed. I did not use the external address of the main channel, because the provider gives it dynamic. We figure out how to tell the router to send such pings through the main channel, even when its route is inactive (route distance is greater than the backup):
The ping traffic used here is non-standard. This is the output traffic coming from the router itself to an external address. Usually, in such cases, the router takes the address of the interface over which the packet will go for src-address . By specifying the local address of the router as src-address , we seem to take it out for the same NAT that the lokalka is sitting at. Further, such traffic is marked with the routing-mark of the main channel, and packets go through the main channel due to the route with a label.
The second rule is also necessary. Without it, if suddenly the main channel falls again, then pings, even marked to_ISP1will go along the route without the backup channel label, which will lead to an incorrect return to the main channel. This is how RouterOS works, if the channel is not connected, then routes, even marked ones, are disabled. To make it a little clearer, imagine that state = 2, the main channel is up, but traffic does not go through it. In this case, it will take 6 seconds to ping. So if at this time the main channel is turned off, then the pings will begin to pass through the reserve. The second rule excludes this.
Note that pings to LAN from the router are not marked and work as usual.
State 3 - transition to the main channel. After the pings started to pass through the main channel, just turn off the backup VPN and the main one will be used. Next, we change the default-route-distance of the backup one by 1 more than the main one, and give a sound signal. We pay attention to the type of PPP connections, and change as necessary.
At this
point, the cycle closes and returns to state 0. Now about how, when the script starts, it recognizes the current state:
Here the logic is also complicated at first glance. 3 parameters are analyzed: whether ISP1 is running, whether ISP2 is running, and they have the default route distance relationship. The initial states 1 and 3 are non-standard, and they indicate incorrect configuration, but in this case the script will restore everything itself, albeit sometimes by unnecessary switching.
That's basically all, developed and debugged on 6.26 and RB951G-2HnD. On other versions - I do not promise, and I'm sorry for the lack of ':' in front of the teams.
In my configuration, in conjunction with this script, another one works that runs on a schedule once a minute. He checks if this script is running, well and additionally sends me an IP address by mail when it changes. Here is a small example, but only the first part:
A global variable can disable the launch of the Failover script. Also, due to the schedule, in case of unforeseen reboots of the router, the script will be automatically launched again.
The basic principle is simple: a raised VPN channel does not mean that the Internet works through it. I check by pinging a few external addresses. You can come up with when the pings are not an indicator of work, but I omit these cases, in the script you can specify any other method of verification, under the situation. Other features: a backup channel is a mobile network, and it connects only if there is no main channel, the rest of the time the interface is off. When you return back to the main channel, its operability is correctly checked. A technique different from ping with an interface. Well, the route-distance of the interfaces changes dynamically and is always not equal, which makes it possible for the channels to work simultaneously, but traffic is directed to only one of them.
Understanding, you can easily redo the script if the providers or one of them gives static.
So, I will consistently describe what setting is needed for the script to work, and then describe the main points of work in pieces. At the end there will be a whole script.
Suppose there are 2 PPP connections ISP1 - the main, and ISP2 - the backup, both are configured and work separately. We set dial-on-demand = no and add-default-route = yes on them , then set the default-route-distance parameter in ISP2 to one more than ISP1. We set up standard things like NAT, marking packets and connections for responses on the same interface from where the request came, routes for marked packets:
Training
/ip firewall mangle
add action=mark-connection chain=forward connection-mark=no-mark \
in-interface=ISP1 new-connection-mark=ISP1 passthrough=no
add action=mark-routing chain=prerouting connection-mark=ISP1 in-interface=\
bridge-local new-routing-mark=to_ISP1 passthrough=no
add action=mark-connection chain=forward connection-mark=no-mark \
in-interface=ISP2 new-connection-mark=ISP2 passthrough=no
add action=mark-routing chain=prerouting connection-mark=ISP2 in-interface=\
bridge-local new-routing-mark=to_ISP2 passthrough=no
/ip firewall nat
add action=masquerade chain=srcnat out-interface=ISP1
add action=masquerade chain=srcnat out-interface=ISP2
/ip route
add distance=1 gateway=ISP1 routing-mark=to_ISP1
add distance=1 gateway=ISP2 routing-mark=to_ISP2
We also assume that the local address of the router is 192.168.xx.yy, and the subnet is 192.168.xx.0 / 24. This data, like the names of the interfaces, needs to be changed to your own. This is not the whole setup, but about everything in order.
Variables
global FailoverTimes;
global FailoverLastTime;
global FailoverLastBackTime;
local ifMain "ISP1";
local ifRes "ISP2";
local scriptName "Failover";
local state 0;
local pingNum 0;
local pingRes;
local routeDist;
local routeDist2;
local tmp;
local ip { x.x.x.x; y.y.y.y; z.z.z.z };
local pingSrcAddr 192.168.xx.yy;
We define the variables: we write the names of the interfaces in ifMain and ifRes , the local address of the router in pingSrcAddr (it will be clear later why it is needed), and 3 external addresses that will be pinged to check the channel in the ip array .
Single instance
if ( [len [/system script job find where script=$scriptName]] > 1) do= { error "single instance" };
delay 15;
Allow only one copy of the script to run. Delay in case of launch at the start of RouterOS, give time to rise connections.
Skip a bit and move on to the main part. The script runs endlessly, or rather, until it is stopped or an error occurs. In an infinite loop, it analyzes the current state with the state variable and performs the necessary actions. Consider them.
State 0
if ($state = 0) do= {
do {
if ($pingNum >= 3) do= { set $pingNum 0; }
if ([ping ($ip->$pingNum) count=1] = 0) do= {
set $pingRes [ping ($ip->0) count=2];
set $pingRes ($pingRes+[ping ($ip->1) count=2]);
set $pingRes ($pingRes+[ping ($ip->2) count=2]);
if ($pingRes = 0) do= {
set $FailoverLastTime "$[/system clock get date] $[/system clock get time]";
set $FailoverTimes ([tonum $FailoverTimes] + 1)
set $state 1;
log info "$scriptName: state changed 0->1";
}
}
set $pingNum ($pingNum + 1);
if ($state = 0) do= { delay 15 };
} while ($state = 0);
}
State 0 - when the main channel is working. Once every 15 seconds, we check sequentially one of the three specified addresses, if there is no answer, we check all 3 addresses. Deaf - initiate the transition to the backup channel. It is strictly indicated that the addresses in the array are 3. If this is not the case, you will have to fix it.
State 1
if ($state = 1) do= {
if ( [/interface l2tp-client get $ifMain default-route-distance] > 10) do= {
/interface ppp-client set $ifRes default-route-distance=1;
}
/interface enable $ifRes;
beep frequency=2000 length=250ms;
delay 500ms;
beep frequency=2000 length=250ms;
delay 500ms;
delay 6;
/interface disable $ifMain;
set $routeDist ([/interface ppp-client get $ifRes default-route-distance] + 1);
/interface l2tp-client set $ifMain default-route-distance=$routeDist;
/interface enable $ifMain;
set $state 2;
log info "$scriptName: state changed 1->2";
}
State 1 - channel switching. It is important which particular PPP connections are used. In the example, ISP1 is l2tp-client, and ISP2 is ppp-client. If others, you need to correct them in lines with default-route-distance .
After turning on the backup channel, we wait 7 seconds. This is enough time for me, during which the 3G connection rises. During this time, the current connections and new ones hang in timeouts, while the main VPN has not yet broken, and the responses of a dest unreachable router are minimized.
Sound indication for an amateur, may work at night. If not necessary - remove.
Next, the main channel is disabled, its default-route-distanceset to 1 more than the backup, and it turns on again. Due to this, we are able to wait for the return of the main channel without interference for the Internet through the reserve.
Looking ahead, when switching back to the main channel and disconnecting the reserve, its default-route-distance will again increase by 1. With each switching of the route distance, PPP connections successively increase. To ensure that they do not go too far, the current value is checked here and a reset of 1 occurs when exceeding 10 (the figure does not matter, taken for example, theoretically a maximum of about 250).
State 2
if ($state = 2) do= {
do {
if ( [len [interface find where name=$ifMain and running] ] = 1) do= {
set $pingRes [ping ($ip->0) src-address=$pingSrcAddr count=2];
set $pingRes ($pingRes+[ping ($ip->1) src-address=$pingSrcAddr count=2]);
set $pingRes ($pingRes+[ping ($ip->2) src-address=$pingSrcAddr count=2]);
if ($pingRes > 0) do= {
set $state 3;
log info "$scriptName: state changed 2->3";
}
}
if ($state = 2) do= { delay 15 };
} while ($state = 2);
}
State 2 - waiting for the restoration of the main channel. It is worth noting that the state of the reserve is not interesting. If he did not connect, there is nothing to be done, all the conditions for him are created, and in fact we are only interested in the main channel.
Here, the VPN of the main channel is expected to be raised, and after that attempts to ping external addresses occur through it with an active reserve. It is made difficult, but correct. If you write ping xx.xx.xx.xx interface = $ ifMain, then according to the developers, this can both work and not. Here ping is used from the local address of the router. It is assumed that it is always there, otherwise why a router is needed. I did not use the external address of the main channel, because the provider gives it dynamic. We figure out how to tell the router to send such pings through the main channel, even when its route is inactive (route distance is greater than the backup):
Tuning
/ip firewall mangle
add action=mark-routing chain=output comment=Failover_script_rule \
dst-address=!192.168.xx.0/24 new-routing-mark=to_ISP1 passthrough=no \
protocol=icmp src-address=192.168.xx.yy
/ip route rule
add action=lookup-only-in-table routing-mark=to_ISP1 src-address=\
192.168.xx.yy/32 table=to_ISP1
The ping traffic used here is non-standard. This is the output traffic coming from the router itself to an external address. Usually, in such cases, the router takes the address of the interface over which the packet will go for src-address . By specifying the local address of the router as src-address , we seem to take it out for the same NAT that the lokalka is sitting at. Further, such traffic is marked with the routing-mark of the main channel, and packets go through the main channel due to the route with a label.
The second rule is also necessary. Without it, if suddenly the main channel falls again, then pings, even marked to_ISP1will go along the route without the backup channel label, which will lead to an incorrect return to the main channel. This is how RouterOS works, if the channel is not connected, then routes, even marked ones, are disabled. To make it a little clearer, imagine that state = 2, the main channel is up, but traffic does not go through it. In this case, it will take 6 seconds to ping. So if at this time the main channel is turned off, then the pings will begin to pass through the reserve. The second rule excludes this.
Note that pings to LAN from the router are not marked and work as usual.
State 3
if ($state = 3) do= {
/interface disable $ifRes;
set $routeDist ([/interface l2tp-client get $ifMain default-route-distance] + 1);
/interface ppp-client set $ifRes default-route-distance=$routeDist;
set $state 0;
set $FailoverLastBackTime "$[/system clock get date] $[/system clock get time]";
log info "$scriptName: state changed 3->0";
beep frequency=500 length=500ms;
}
State 3 - transition to the main channel. After the pings started to pass through the main channel, just turn off the backup VPN and the main one will be used. Next, we change the default-route-distance of the backup one by 1 more than the main one, and give a sound signal. We pay attention to the type of PPP connections, and change as necessary.
At this
point, the cycle closes and returns to state 0. Now about how, when the script starts, it recognizes the current state:
Initial state
set $routeDist [/interface l2tp-client get $ifMain default-route-distance];
set $routeDist2 [/interface ppp-client get $ifRes default-route-distance];
if ($routeDist < $routeDist2) do= {
if ( [/interface get $ifMain running] = true) do= { set $state 0; } else= { set $state 1; }
} else= {
if ( [/interface get $ifMain disabled] = true) do= { /interface enable $ifMain; }
if ($routeDist > $routeDist2 and [/interface get $ifRes disabled] = false) do= {
set $state 2;
} else= { set $state 3; }
}
log info "$scriptName: initial state $state";
Here the logic is also complicated at first glance. 3 parameters are analyzed: whether ISP1 is running, whether ISP2 is running, and they have the default route distance relationship. The initial states 1 and 3 are non-standard, and they indicate incorrect configuration, but in this case the script will restore everything itself, albeit sometimes by unnecessary switching.
Excluded condition
I have one more condition that I excluded, because most likely it is hardly needed by the majority. My ISP1 connects a VPN by name, not by IP, to resolve this name you need to use the DNS of the same provider, because it resolves to a local address. And if you do not help the script with the resolution, indicating a specific DNS, then even after the availability of the ISP1 network, it will never connect, because will not resolve the domain name, but will continue to use the DNS reserve. This is extra. state:
Instead of DNSip1, DNSip2 and VPNaddress we substitute the necessary data. All states below respectively shift by +1.
if ($state = 2) do= {
do {
if (([ping DNSip1 count=1] > 0) or ([ping DNSip2 count=1] > 0)) do= {
set $tmp 0;
do { resolve VPNaddress server=DNSip1; } on-error= { };
do { resolve VPNaddress server=DNSip2; } on-error= { };
do { resolve VPNaddress } on-error= { set $tmp 1; };
if ($tmp = 0) do= {
set $state 3;
log info "$scriptName: state changed 2->3";
delay 5;
}
}
if ($state = 2) do= { delay 15 };
} while ($state = 2);
}
Instead of DNSip1, DNSip2 and VPNaddress we substitute the necessary data. All states below respectively shift by +1.
That's basically all, developed and debugged on 6.26 and RB951G-2HnD. On other versions - I do not promise, and I'm sorry for the lack of ':' in front of the teams.
In my configuration, in conjunction with this script, another one works that runs on a schedule once a minute. He checks if this script is running, well and additionally sends me an IP address by mail when it changes. Here is a small example, but only the first part:
Script monitor
global FailoverDisabled;
if ( [len [/system script job find where script="Failover"]] = 0 and $FailoverDisabled != 1) do= {
do { execute script="Failover"; } on-error= { log info "$scriptName: Failed to execute Failover" };
}
A global variable can disable the launch of the Failover script. Also, due to the schedule, in case of unforeseen reboots of the router, the script will be automatically launched again.
Failover script entirely
global FailoverTimes;
global FailoverLastTime;
global FailoverLastBackTime;
local ifMain "ISP1";
local ifRes "ISP2";
local scriptName "Failover";
local state 0;
local pingNum 0;
local pingRes;
local routeDist;
local routeDist2;
local tmp;
local ip { x.x.x.x; y.y.y.y; z.z.z.z };
local pingSrcAddr 192.168.xx.yy;
if ( [len [/system script job find where script=$scriptName]] > 1) do= { error "single instance" };
delay 15;
set $routeDist [/interface l2tp-client get $ifMain default-route-distance];
set $routeDist2 [/interface ppp-client get $ifRes default-route-distance];
if ($routeDist < $routeDist2) do= {
if ( [/interface get $ifMain running] = true) do= { set $state 0; } else= { set $state 1; }
} else= {
if ( [/interface get $ifMain disabled] = true) do= { /interface enable $ifMain; }
if ($routeDist > $routeDist2 and [/interface get $ifRes disabled] = false) do= {
set $state 2;
} else= { set $state 3; }
}
log info "$scriptName: initial state $state";
do {
if ($state = 0) do= {
do {
if ($pingNum >= 3) do= { set $pingNum 0; }
if ([ping ($ip->$pingNum) count=1] = 0) do= {
set $pingRes [ping ($ip->0) count=2];
set $pingRes ($pingRes+[ping ($ip->1) count=2]);
set $pingRes ($pingRes+[ping ($ip->2) count=2]);
if ($pingRes = 0) do= {
set $FailoverLastTime "$[/system clock get date] $[/system clock get time]";
set $FailoverTimes ([tonum $FailoverTimes] + 1)
set $state 1;
log info "$scriptName: state changed 0->1";
}
}
set $pingNum ($pingNum + 1);
if ($state = 0) do= { delay 15 };
} while ($state = 0);
}
# endof if state = 0
if ($state = 1) do= {
if ( [/interface l2tp-client get $ifMain default-route-distance] > 10) do= {
/interface ppp-client set $ifRes default-route-distance=1;
}
/interface enable $ifRes;
beep frequency=2000 length=250ms;
delay 500ms;
beep frequency=2000 length=250ms;
delay 500ms;
delay 6;
/interface disable $ifMain;
set $routeDist ([/interface ppp-client get $ifRes default-route-distance] + 1);
/interface l2tp-client set $ifMain default-route-distance=$routeDist;
/interface enable $ifMain;
set $state 2;
log info "$scriptName: state changed 1->2";
}
if ($state = 2) do= {
do {
if ( [len [interface find where name=$ifMain and running] ] = 1) do= {
set $pingRes [ping ($ip->0) src-address=$pingSrcAddr count=2];
set $pingRes ($pingRes+[ping ($ip->1) src-address=$pingSrcAddr count=2]);
set $pingRes ($pingRes+[ping ($ip->2) src-address=$pingSrcAddr count=2]);
if ($pingRes > 0) do= {
set $state 3;
log info "$scriptName: state changed 2->3";
}
}
if ($state = 2) do= { delay 15 };
} while ($state = 2);
}
# endof if state = 2
if ($state = 3) do= {
/interface disable $ifRes;
set $routeDist ([/interface l2tp-client get $ifMain default-route-distance] + 1);
/interface ppp-client set $ifRes default-route-distance=$routeDist;
set $state 0;
set $FailoverLastBackTime "$[/system clock get date] $[/system clock get time]";
log info "$scriptName: state changed 3->0";
beep frequency=500 length=500ms;
}
# bad programming protection
delay 1;
} while= ( true );