Batfish Introduction
One of the problems of modern networks is their fragility. Many filtering rules, routing information exchange policies, dynamic routing protocols make networks confusing and subject to human factors. A network crash can happen unintentionally when making changes to a route-map or ACL ( one , two ). We definitely lack a tool to evaluate the behavior of a network with a new configuration before making changes to production. I want to know for sure if Network A will be available to me if I filter out some of the BGP announcements received from provider B? What route will the packets go from network C to server D, if on one of the transit links I double the IGP metric? Batfish will help us answer these and many other questions!
Review of Batfish
Batfish is a network modeling tool. Its main purpose is to test configuration changes before making them to the production network. Batfish can also be used to analyze and check the current status of the network. Existing CI / CD processes in the network world clearly lack a tool for testing new configurations. Batfish solves this problem.
Batfish does not require direct direct access to existing network equipment, Batfish models the network behavior based on the data contained in the device configuration files.
Batfish can:
- determine the neighbor status of dynamic routing protocols in the network (BGP, IS-IS, OSPF)
- calculate the RIB of each network element
- check NTP, AAA, MTU settings
- allow determining whether the ACL blocks the passage of network traffic (analog of packet-tracer on the Cisco ASA)
- check for end-to-end connectivity between hosts within the network
- show the path of traffic through the network (virtual tracing)
Supported Platforms:
- Arista
- Aruba
- AWS (VPCs, Network ACLs, VPN GW, NAT GW, Internet GW, Security Groups)
- Cisco (NX-OS, IOS, IOS-XE, IOS-XR and ASA)
- Dell force10
- Foundry
- iptables
- Juniper (MX, EX, QFX, SRX, T-series, PTX)
- Mrv
- Palo alto networks
- Quagga / FRR
- Quanta
- Vyos
Batfish is a Java application. For convenient work with it was written Pybatfish - python SDK.
Let's move on to practice. I will show you the possibilities of Batfish with an example.
Example
We manage two autonomous systems: AS 41214 and AS 10631. As IGP, AS-41214 uses IS-IS, and AS-10631 - OSPF. Inside each AS, IBGP-fullmesh is used. LDN-CORE-01 announces its BGP neighbors prefix 135.65.0.0/19, MSK-CORE-01 - 140.0.0.0/24. Exchange of routing information between autonomous systems occurs at the junction of HKI-CORE-01 - SPB-CORE-01.
HKI-CORE-01, STH-CORE-01 - Junos routers
LDN-CORE-01, AMS-CORE-01, SPB-CORE-01, MSK-CORE-01 - Cisco IOS routers Install the container with Batfish and python SDK:
docker pull batfish/allinone
docker run batfish/allinone
docker container exec -it <container> bash
Get to know the library through python interactive mode:
root@ea9a1559d88e:/# python3
--------------------
>>> from pybatfish.client.commands import bf_logger, bf_init_snapshot
>>> from pybatfish.question.question import load_questions
>>> from pybatfish.question import bfq
>>> import logging
>>> bf_logger.setLevel(logging.ERROR)
>>> load_questions()
>>> bf_init_snapshot('tmp/habr')
'ss_e8065858-a911-4f8a-b020-49c9b96d0381'
bf_init_snapshot ('tmp / habr') - the function loads configuration files into Batfish and prepares them for analysis.
/ tmp / habr - a directory with router configuration files.
root@ea9a1559d88e:/tmp/habr# tree
.
`-- configs
|-- AMS-CORE-01.cfg
|-- HKI-CORE-01.cfg
|-- LDN-CORE-01.cfg
|-- MSK-CORE-01.cfg
|-- SPB-CORE-01.cfg
`-- STH-CORE-01.cfg
1 directory, 6 files
Now let's determine the status of BGP sessions on the LDN-CORE-01 router:
>>> bgp_peers = bfq.bgpSessionStatus(nodes='LDN-CORE-01').answer().frame()
>>> bgp_peers
Node VRF Local_AS Local_IP Remote_AS Remote_Node Remote_IP Session_Type Est_Status
0 ldn-core-01 default 41214 172.20.20.1 41214 sth-core-01 172.20.20.2 IBGP EST
1 ldn-core-01 default 41214 172.20.20.1 41214 ams-core-01 172.20.20.3 IBGP EST
2 ldn-core-01 default 41214 172.20.20.1 41214 hki-core-01 172.20.20.4 IBGP EST
Well how? Sounds like the truth?
LDN-CORE-01#show ip bgp summary
…
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
172.20.20.2 4 41214 629 669 9 0 0 00:56:51 0
172.20.20.3 4 41214 826 827 9 0 0 01:10:18 0
172.20.20.4 4 41214 547 583 9 0 0 00:49:24 1
Now let's see what IS-IS routes are in the RIB on the HKI-CORE-01 router according to Batfish:
>>> isis_routes = bfq.routes(nodes='HKI-CORE-01', protocols='isis').answer().frame()
>>> isis_routes
Node VRF Network Next_Hop Next_Hop_IP Protocol Admin_Distance Metric Tag
0 hki-core-01 default 172.20.20.3/32 ams-core-01 10.0.0.6 isisL2 18 20 None
1 hki-core-01 default 172.20.20.1/32 ams-core-01 10.0.0.6 isisL2 18 30 None
2 hki-core-01 default 172.20.20.2/32 sth-core-01 10.0.0.4 isisL2 18 10 None
3 hki-core-01 default 172.20.20.1/32 sth-core-01 10.0.0.4 isisL2 18 30 None
4 hki-core-01 default 10.0.0.0/31 sth-core-01 10.0.0.4 isisL2 18 20 None
5 hki-core-01 default 10.0.0.2/31 ams-core-01 10.0.0.6 isisL2 18 20 None
At the command line:
showroute@HKI-CORE-01# run show route table inet.0 protocol isis
inet.0: 18 destinations, 18 routes (18 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
10.0.0.0/31 *[IS-IS/18] 00:51:25, metric 20
> to 10.0.0.4 via ge-0/0/0.0
10.0.0.2/31 *[IS-IS/18] 00:51:45, metric 20
> to 10.0.0.6 via ge-0/0/1.0
172.20.20.1/32 *[IS-IS/18] 00:51:25, metric 30
to 10.0.0.4 via ge-0/0/0.0
> to 10.0.0.6 via ge-0/0/1.0
172.20.20.2/32 *[IS-IS/18] 00:51:25, metric 10
> to 10.0.0.4 via ge-0/0/0.0
172.20.20.3/32 *[IS-IS/18] 00:51:45, metric 20
> to 10.0.0.6 via ge-0/0/1.0
Fine! I suppose it has become clearer to you that there is Batfish.
At the beginning of the article, I wrote that Batfish can be used to check configuration changes before making them to the “battle” network. Now I propose to consider the process of testing a network based on RobotFramework . To do this, I wrote a small module based on PyBatfish that allows you to perform the following checks:
- Determine the status of BGP sessions on the network
- Determine IS-IS Neighbor Status
- Check for end-to-end connectivity between nodes on a network with trace demonstration
- Determine the size of the RIB on the router for a specific dynamic routing protocol
LibraryBatfish.py
import logging
from pybatfish.client.commands import bf_logger, bf_init_snapshot
from pybatfish.question.question import load_questions, list_questions
from pybatfish.question import bfq
from pybatfish.datamodel.flow import HeaderConstraints, PathConstraints
from robot.api import logger
classLibraryBatfish(object):def__init__(self, snapshot):
bf_logger.setLevel(logging.ERROR)
load_questions()
bf_init_snapshot(snapshot)
defcheck_bgp_peers(self):
not_established_peers = list()
bgp_peers = bfq.bgpSessionStatus().answer()
for peer in bgp_peers.rows:
if peer.get('Established_Status') != 'ESTABLISHED':
not_established_peers.append(dict.fromkeys(peer.get('Local_IP').split(), peer.get('Remote_IP').get('value')))
if len(not_established_peers) == 0:
return1else:
logger.warn('BGP neighbors are not in an established state:')
for neighborship in not_established_peers:
for peer in neighborship:
logger.warn('{} - {}'.format(peer, neighborship.get(peer)))
return0defcheck_routes(self, node, protocol):
routes = bfq.routes(nodes=node, protocols=protocol).answer()
return len(routes.rows)
defcheck_isis_neighbors(self, description):
not_isis_enabled_links = list()
for link in self._get_isis_enabled_links(description):
if link notin self._get_isis_neighbors():
not_isis_enabled_links.append(link)
if len(not_isis_enabled_links) == 0:
return1else:
for link in not_isis_enabled_links:
logger.warn('{} {} has no IS-IS neighbor'.format(link.get('hostname'), link.get('interface')))
return0defping(self, source_ip, destination_ip):
ip_owners = bfq.ipOwners().answer()
traceroute = self._get_traceroute_status(source_ip, destination_ip, ip_owners)
reverse_traceroute = self._get_traceroute_status(destination_ip, source_ip, ip_owners)
if traceroute == Trueand reverse_traceroute == True:
self._show_trace(source_ip, destination_ip, ip_owners)
return1else:
logger.warn('Ping {} -> {} failed'.format(source_ip, destination_ip))
return0def_get_traceroute_status(self, source_ip, destination_ip, addresses):
tracert = self._unidirectional_virtual_traceroute(source_ip, destination_ip, addresses)
isAccepted = Trueif tracert != None:
for trace in tracert.rows[0].get('Traces'):
if trace.get('disposition') != 'ACCEPTED':
isAccepted = Falseif isAccepted == True:
returnTrueelse:
returnFalsedef_get_paths(self, source_ip, destination_ip, addresses):
tracert = self._unidirectional_virtual_traceroute(source_ip, destination_ip, addresses)
traces = tracert.rows[0].get('Traces')
paths = dict()
path_number = 1for trace in traces:
if trace.get('disposition') == 'ACCEPTED':
path = list()
for hop in trace.get('hops'):
path.append(hop.get('node').get('name'))
paths[path_number] = path
path_number += 1return paths
def_unidirectional_virtual_traceroute(self, source_ip, destination_ip, addresses):for address in addresses.rows:
if address.get('IP') == source_ip:
node = address.get('Node').get('name')
int = address.get('Interface')
headers = HeaderConstraints(srcIps=source_ip, dstIps=destination_ip, ipProtocols=['ICMP'])
try:
tracert = bfq.traceroute(startLocation="{}[{}]".format(node,int), headers=headers).answer()
return tracert
except:
logger.warn('{} address has not been found'.format(source_ip))
def_get_isis_enabled_links(self, description='core-link'):
isis_enabled_links = list()
interfaces = bfq.interfaceProperties().answer()
for int in interfaces.rows:
if int.get('Description') != Noneand description in int.get('Description'):
isis_enabled_links.append({'hostname' : int.get('Interface').get('hostname'),
'interface' : int.get('Interface').get('interface')})
return isis_enabled_links
def_get_isis_neighbors(self):
isis_neighbors = list()
isis_adjacencies = bfq.edges(edgeType='isis').answer()
for neighbor in isis_adjacencies.rows:
isis_neighbors.append(neighbor.get('Interface'))
return isis_neighbors
def_show_trace(self, source_ip, destination_ip, addresses):
logger.console('\nTraceroute to {} from {}'.format(destination_ip, source_ip))
paths = self._get_paths(source_ip, destination_ip, addresses)
path_num = 1for path in paths:
n = 1
logger.console('\n Path N{}'.format(path_num))
for hop in paths.get(path):
logger.console(' {} {}'.format(n, hop))
n += 1
path_num += 1
batfish-test.robot
Scenario N1
Under my control is still the same network. Suppose I need to clean up the filters on the border of AS 41214 and AS 10631 and block at the junction packets containing source or destination ip addresses from the BOGONS range.
Run the test before making changes.
Tests passed.
We will make changes to the test configuration of the HKI-CORE-01 router - /tmp/habr/configs/HKI-CORE-01.cfg:
set firewall family inet filter BOGONS term TERM010 from address 0.0.0.0/8
set firewall family inet filter BOGONS term TERM010 from address 10.0.0.0/8
set firewall family inet filter BOGONS term TERM010 from address 100.64.0.0/10
set firewall family inet filter BOGONS term TERM010 from address 127.0.0.0/8
set firewall family inet filter BOGONS term TERM010 from address 169.254.0.0/16
set firewall family inet filter BOGONS term TERM010 from address 172.16.0.0/12
set firewall family inet filter BOGONS term TERM010 from address 192.0.2.0/24
set firewall family inet filter BOGONS term TERM010 from address 192.88.99.0/24
set firewall family inet filter BOGONS term TERM010 from address 192.168.0.0/16
set firewall family inet filter BOGONS term TERM010 from address 198.18.0.0/15
set firewall family inet filter BOGONS term TERM010 from address 198.51.100.0/24
set firewall family inet filter BOGONS term TERM010 from address 203.0.113.0/24
set firewall family inet filter BOGONS term TERM010 from address 224.0.0.0/4
set firewall family inet filter BOGONS term TERM010 from address 240.0.0.0/4
set firewall family inet filter BOGONS term TERM010 then discard
set firewall family inet filter BOGONS term PERMIT-IP-ANY-ANY then accept
set interfaces ge-0/0/2.0 family inet filter input BOGONS
set interfaces ge-0/0/2.0 family inet filter output BOGONS
Run the test.
I was very close, but as the test output shows, after the BGP changes made, the neighborhood 192.168.30.0 - 192.168.30.1 is not in the Established state -> as a result, the IP connectivity between points 135.65.0.1 <-> 140.0.0.1 is lost. What is wrong? We look carefully at the HKI-CORE-01 configuration and see that eBGP peering is installed on private addresses:
showroute@HKI-CORE-01# show interfaces ge-0/0/2 | display set set interfaces ge-0/0/2 description SPB-CORE-01
set interfaces ge-0/0/2 unit 0 family inet filter input BOGONS
set interfaces ge-0/0/2 unit 0 family inet filter output BOGONS
set interfaces ge-0/0/2 unit 0 family inet address 192.168.30.0/31
Conclusion: it is necessary to change the addresses at the junction or add the 192.168.30.0/31 subnet to the exception.
I will add a network at the junction to the exception, I will update /tmp/habr/configs/HKI-CORE-01.cfg again:
set firewall family inet filter BOGONS term TERM005 from address 192.168.0.0/31
set firewall family inet filter BOGONS term TERM005 then accept
Run the test.
Now unwanted traffic will not go through the ebgp interface AS 41214 - AS 10631. You can safely make changes without fear of consequences.
Scenario N2
Here I need to terminate the network 150.0.0.0/24 on the MSK-CORE-01 router and ensure connectivity between points 135.65.0.1 and 150.0.0.1. I add the
following lines to the test configuration of the MSK-CORE-01 router - tmp / habr / configs / MSK- CORE-01.cfg:
interface Loopback2
ip address 150.0.0.1 255.255.255.255
!
ip route 150.0.0.0 255.255.255.0 Null0
!
router bgp 10631
!
address-family ipv4
network 150.0.0.0 mask 255.255.255.0
!
I change the test script and run the test:
git diff HEAD~
diff --git a/batfish-robot.robot b/batfish-robot.robot
index 8d963c5..ce8cb6a 100644
--- a/batfish-robot.robot
+++ b/batfish-robot.robot
@@ -5,7 +5,7 @@ Library LibraryBatfish.py tmp/habr
${ISIS-ENABLED-LINK-DESCRIPTION} ISIS-LINK
${NODE} HKI-CORE-01
${PROTOCOL} ebgp
-${RIB-SIZE} 1
+${RIB-SIZE} 2
*** Test Cases ***
ISIS
@@ -27,3 +27,8 @@ Ping
[Documentation] Test end-to-end ICMP connectivity & show traceroute
${result}= Ping 135.65.0.1 140.0.0.1
Should Be Equal As Integers ${result} 1
+
+Ping2
+ [Documentation] Test end-to-end ICMP connectivity & show traceroute
+ ${result}= Ping 135.65.0.1 150.0.0.1
+ Should Be Equal As Integers ${result} 1
Now I expect to see two eBGP routes on the HKI-CORE-01 router, an additional connectivity check has been added.
There is no connection between 135.65.0.1 and 150.0.0.1, moreover, on the HKI-CORE-01 router there is only one eBGP route, instead of two.
Check the contents of the RIB on the HKI-CORE-01 when adding a new configuration to the MSK-CORE-01 router :
showroute@HKI-CORE-01# run show route table inet.0 protocol bgp
inet.0: 20 destinations, 20 routes (19 active, 0 holddown, 1 hidden)
+ = Active Route, - = Last Active, * = Both
135.65.0.0/19 *[BGP/170] 02:25:38, MED 0, localpref 100, from 172.20.20.1
AS path: I, validation-state: unverified
> to 10.0.0.4 via ge-0/0/0.0
to 10.0.0.6 via ge-0/0/1.0
140.0.0.0/24 *[BGP/170] 01:38:02, localpref 100
AS path: 10631 I, validation-state: unverified
> to 192.168.30.1 via ge-0/0/2.0
showroute@HKI-CORE-01# run show route table inet.0 protocol bgp hidden detail
inet.0: 20 destinations, 20 routes (19 active, 0 holddown, 1 hidden)
150.0.0.0/24 (1 entry, 0 announced)
BGP /-101
Next hop type: Router, Next hop index: 563
Address: 0x940f43c
Next-hop reference count: 4
Source: 192.168.30.1
Next hop: 192.168.30.1 via ge-0/0/2.0, selected
Session Id: 0x9
State: <Hidden Ext>
Local AS: 41214 Peer AS: 10631
Age: 1:42:03
Validation State: unverified
Task: BGP_10631.192.168.30.1+179
AS path: 10631 I
Localpref: 100
Router ID: 10.68.1.1
Hidden reason: rejected by import policy
Note the import policy for prefixes received from SPB-CORE-01 :
set protocols bgp group AS10631 import FROM-AS10631
set protocols bgp group AS10631 neighbor 192.168.30.1 description SPB-CORE-01
set protocols bgp group AS10631 neighbor 192.168.30.1 peer-as 10631
set policy-options policy-statement FROM-AS10631 term TERM010 from route-filter 140.0.0.0/24 exact
set policy-options policy-statement FROM-AS10631 term TERM010 then accept
set policy-options policy-statement FROM-AS10631 term DENY then reject
Lacking a rule allowing 150.0.0.0/24. Add it to the test configuration and run the test:
showroute@HKI-CORE-01# show | compare
[edit policy-options policy-statement FROM-AS10631 term TERM010 from]
route-filter 140.0.0.0/24 exact { ... }
+ route-filter 150.0.0.0/24 exact;
[edit]
Great, there is connectivity between the networks, all the tests are passed! So you can make these changes to the work of the "combat" network.
Conclusion
In my opinion, Batfish is a powerful tool with great potential. Try it and see for yourself.
If this topic is interesting to you - join the slack chat, Batfish developers will be happy to answer any questions and quickly fix bugs.
batfish-org.slack.com
Thank you for your attention.
References
www.batfish.org
www.youtube.com/channel/UCA-OUW_3IOt9U_s60KvmJYA
github.com/batfish/batfish
media.readthedocs.org/pdf/pybatfish/latest/pybatfish.pdf
github.com/showroute/batfish-habr