Batfish Introduction

image

One of the problems of modern networks is their fragility. Many filtering rules, routing information exchange policies, dynamic routing protocols make networks confusing and subject to human factors. A network crash can happen unintentionally when making changes to a route-map or ACL ( one , two ). We definitely lack a tool to evaluate the behavior of a network with a new configuration before making changes to production. I want to know for sure if Network A will be available to me if I filter out some of the BGP announcements received from provider B? What route will the packets go from network C to server D, if on one of the transit links I double the IGP metric? Batfish will help us answer these and many other questions!

Review of Batfish


Batfish is a network modeling tool. Its main purpose is to test configuration changes before making them to the production network. Batfish can also be used to analyze and check the current status of the network. Existing CI / CD processes in the network world clearly lack a tool for testing new configurations. Batfish solves this problem.

Batfish does not require direct direct access to existing network equipment, Batfish models the network behavior based on the data contained in the device configuration files.

Batfish can:

  • determine the neighbor status of dynamic routing protocols in the network (BGP, IS-IS, OSPF)
  • calculate the RIB of each network element
  • check NTP, AAA, MTU settings
  • allow determining whether the ACL blocks the passage of network traffic (analog of packet-tracer on the Cisco ASA)
  • check for end-to-end connectivity between hosts within the network
  • show the path of traffic through the network (virtual tracing)


Supported Platforms:

  • Arista
  • Aruba
  • AWS (VPCs, Network ACLs, VPN GW, NAT GW, Internet GW, Security Groups)
  • Cisco (NX-OS, IOS, IOS-XE, IOS-XR and ASA)
  • Dell force10
  • Foundry
  • iptables
  • Juniper (MX, EX, QFX, SRX, T-series, PTX)
  • Mrv
  • Palo alto networks
  • Quagga / FRR
  • Quanta
  • Vyos

image

Batfish is a Java application. For convenient work with it was written Pybatfish - python SDK.

Let's move on to practice. I will show you the possibilities of Batfish with an example.

Example


We manage two autonomous systems: AS 41214 and AS 10631. As IGP, AS-41214 uses IS-IS, and AS-10631 - OSPF. Inside each AS, IBGP-fullmesh is used. LDN-CORE-01 announces its BGP neighbors prefix 135.65.0.0/19, MSK-CORE-01 - 140.0.0.0/24. Exchange of routing information between autonomous systems occurs at the junction of HKI-CORE-01 - SPB-CORE-01.

HKI-CORE-01, STH-CORE-01 - Junos routers
LDN-CORE-01, AMS-CORE-01, SPB-CORE-01, MSK-CORE-01 - Cisco IOS routers Install the container with Batfish and python SDK:





docker pull batfish/allinone
docker run batfish/allinone
docker container exec -it <container> bash

Get to know the library through python interactive mode:

root@ea9a1559d88e:/# python3
--------------------
>>> from pybatfish.client.commands import bf_logger, bf_init_snapshot
>>> from pybatfish.question.question import load_questions
>>> from pybatfish.question import bfq
>>> import logging
>>> bf_logger.setLevel(logging.ERROR)
>>> load_questions()
>>> bf_init_snapshot('tmp/habr')
'ss_e8065858-a911-4f8a-b020-49c9b96d0381'

bf_init_snapshot ('tmp / habr') - the function loads configuration files into Batfish and prepares them for analysis.

/ tmp / habr - a directory with router configuration files.

root@ea9a1559d88e:/tmp/habr# tree
.
`-- configs
    |-- AMS-CORE-01.cfg
    |-- HKI-CORE-01.cfg
    |-- LDN-CORE-01.cfg
    |-- MSK-CORE-01.cfg
    |-- SPB-CORE-01.cfg
    `-- STH-CORE-01.cfg
1 directory, 6 files

Now let's determine the status of BGP sessions on the LDN-CORE-01 router:


>>> bgp_peers = bfq.bgpSessionStatus(nodes='LDN-CORE-01').answer().frame()
>>> bgp_peers
Node VRF Local_AS Local_IP Remote_AS Remote_Node Remote_IP Session_Type Est_Status
0 ldn-core-01 default 41214 172.20.20.1 41214 sth-core-01 172.20.20.2 IBGP EST
1 ldn-core-01 default 41214 172.20.20.1 41214 ams-core-01 172.20.20.3 IBGP EST
2 ldn-core-01 default 41214 172.20.20.1 41214 hki-core-01 172.20.20.4 IBGP EST

Well how? Sounds like the truth?


LDN-CORE-01#show ip bgp summary
…
Neighbor        V           AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
172.20.20.2     4        41214     629     669        9    0    0 00:56:51      0
172.20.20.3     4        41214     826     827        9    0    0 01:10:18      0
172.20.20.4     4        41214     547     583        9    0    0 00:49:24      1

Now let's see what IS-IS routes are in the RIB on the HKI-CORE-01 router according to Batfish:


>>> isis_routes = bfq.routes(nodes='HKI-CORE-01', protocols='isis').answer().frame()
>>> isis_routes
Node VRF Network Next_Hop Next_Hop_IP Protocol Admin_Distance Metric Tag
0  hki-core-01 default 172.20.20.3/32  ams-core-01  10.0.0.6   isisL2  18  20  None
1  hki-core-01 default 172.20.20.1/32  ams-core-01  10.0.0.6   isisL2  18  30  None
2  hki-core-01 default 172.20.20.2/32  sth-core-01  10.0.0.4   isisL2  18  10  None
3  hki-core-01 default 172.20.20.1/32  sth-core-01  10.0.0.4   isisL2  18  30  None
4  hki-core-01 default 10.0.0.0/31  sth-core-01  10.0.0.4   isisL2  18  20  None
5  hki-core-01 default 10.0.0.2/31  ams-core-01 10.0.0.6   isisL2  18  20  None

At the command line:


showroute@HKI-CORE-01# run show route table inet.0 protocol isis
inet.0: 18 destinations, 18 routes (18 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
10.0.0.0/31        *[IS-IS/18] 00:51:25, metric 20
                    > to 10.0.0.4 via ge-0/0/0.0
10.0.0.2/31        *[IS-IS/18] 00:51:45, metric 20
                    > to 10.0.0.6 via ge-0/0/1.0
172.20.20.1/32     *[IS-IS/18] 00:51:25, metric 30
                      to 10.0.0.4 via ge-0/0/0.0
                    > to 10.0.0.6 via ge-0/0/1.0
172.20.20.2/32     *[IS-IS/18] 00:51:25, metric 10
                    > to 10.0.0.4 via ge-0/0/0.0
172.20.20.3/32     *[IS-IS/18] 00:51:45, metric 20
                    > to 10.0.0.6 via ge-0/0/1.0

Fine! I suppose it has become clearer to you that there is Batfish.

At the beginning of the article, I wrote that Batfish can be used to check configuration changes before making them to the “battle” network. Now I propose to consider the process of testing a network based on RobotFramework . To do this, I wrote a small module based on PyBatfish that allows you to perform the following checks:

  • Determine the status of BGP sessions on the network
  • Determine IS-IS Neighbor Status
  • Check for end-to-end connectivity between nodes on a network with trace demonstration
  • Determine the size of the RIB on the router for a specific dynamic routing protocol

LibraryBatfish.py
import logging
from pybatfish.client.commands import bf_logger, bf_init_snapshot
from pybatfish.question.question import load_questions, list_questions
from pybatfish.question import bfq
from pybatfish.datamodel.flow import HeaderConstraints, PathConstraints
from robot.api import logger
classLibraryBatfish(object):def__init__(self, snapshot):
        bf_logger.setLevel(logging.ERROR)
        load_questions()
        bf_init_snapshot(snapshot)
    defcheck_bgp_peers(self):
        not_established_peers = list()
        bgp_peers = bfq.bgpSessionStatus().answer()
        for peer in bgp_peers.rows:
            if peer.get('Established_Status') != 'ESTABLISHED':
                not_established_peers.append(dict.fromkeys(peer.get('Local_IP').split(), peer.get('Remote_IP').get('value')))
        if len(not_established_peers) == 0:
            return1else:
            logger.warn('BGP neighbors are not in an established state:')
            for neighborship in not_established_peers:
                for peer in neighborship:
                    logger.warn('{} - {}'.format(peer, neighborship.get(peer)))
            return0defcheck_routes(self, node, protocol):
        routes = bfq.routes(nodes=node, protocols=protocol).answer()
        return len(routes.rows)
    defcheck_isis_neighbors(self, description):
        not_isis_enabled_links = list()
        for link in self._get_isis_enabled_links(description):
            if link notin self._get_isis_neighbors():
                not_isis_enabled_links.append(link)
        if len(not_isis_enabled_links) == 0:
            return1else:
            for link in not_isis_enabled_links:
                logger.warn('{} {} has no IS-IS neighbor'.format(link.get('hostname'), link.get('interface')))
            return0defping(self, source_ip, destination_ip):
        ip_owners = bfq.ipOwners().answer()
        traceroute = self._get_traceroute_status(source_ip, destination_ip, ip_owners)
        reverse_traceroute = self._get_traceroute_status(destination_ip, source_ip, ip_owners)
        if  traceroute == Trueand reverse_traceroute == True:
            self._show_trace(source_ip, destination_ip, ip_owners)
            return1else:
            logger.warn('Ping {} -> {} failed'.format(source_ip, destination_ip))
            return0def_get_traceroute_status(self, source_ip, destination_ip, addresses):
        tracert = self._unidirectional_virtual_traceroute(source_ip, destination_ip, addresses)
        isAccepted = Trueif tracert != None:
            for trace in tracert.rows[0].get('Traces'):
                if trace.get('disposition') != 'ACCEPTED':
                    isAccepted = Falseif isAccepted == True:
            returnTrueelse:
            returnFalsedef_get_paths(self, source_ip, destination_ip, addresses):
        tracert = self._unidirectional_virtual_traceroute(source_ip, destination_ip, addresses)
        traces = tracert.rows[0].get('Traces')
        paths = dict()
        path_number = 1for trace in traces:
            if trace.get('disposition') == 'ACCEPTED':
                path = list()
                for hop in trace.get('hops'):
                    path.append(hop.get('node').get('name'))
                paths[path_number] = path
                path_number += 1return paths
    def_unidirectional_virtual_traceroute(self, source_ip, destination_ip, addresses):for address in addresses.rows:
            if address.get('IP') == source_ip:
                node = address.get('Node').get('name')
                int = address.get('Interface')
        headers = HeaderConstraints(srcIps=source_ip, dstIps=destination_ip, ipProtocols=['ICMP'])
        try:
            tracert = bfq.traceroute(startLocation="{}[{}]".format(node,int), headers=headers).answer()
            return tracert
        except:
            logger.warn('{} address has not been found'.format(source_ip))
    def_get_isis_enabled_links(self, description='core-link'):
        isis_enabled_links = list()
        interfaces = bfq.interfaceProperties().answer()
        for int in interfaces.rows:
            if int.get('Description') != Noneand description in int.get('Description'):
                isis_enabled_links.append({'hostname' : int.get('Interface').get('hostname'),
                                           'interface' : int.get('Interface').get('interface')})
        return isis_enabled_links
    def_get_isis_neighbors(self):
        isis_neighbors = list()
        isis_adjacencies = bfq.edges(edgeType='isis').answer()
        for neighbor in isis_adjacencies.rows:
            isis_neighbors.append(neighbor.get('Interface'))
        return isis_neighbors
    def_show_trace(self, source_ip, destination_ip, addresses):
        logger.console('\nTraceroute to {} from {}'.format(destination_ip, source_ip))
        paths = self._get_paths(source_ip, destination_ip, addresses)
        path_num = 1for path in paths:
            n = 1
            logger.console('\n  Path N{}'.format(path_num))
            for hop in paths.get(path):
                logger.console('  {} {}'.format(n, hop))
                n += 1
            path_num += 1


batfish-test.robot
image

Scenario N1




Under my control is still the same network. Suppose I need to clean up the filters on the border of AS 41214 and AS 10631 and block at the junction packets containing source or destination ip addresses from the BOGONS range.

Run the test before making changes.

image

Tests passed.

We will make changes to the test configuration of the HKI-CORE-01 router - /tmp/habr/configs/HKI-CORE-01.cfg:

set firewall family inet filter BOGONS term TERM010 from address 0.0.0.0/8
set firewall family inet filter BOGONS term TERM010 from address 10.0.0.0/8
set firewall family inet filter BOGONS term TERM010 from address 100.64.0.0/10
set firewall family inet filter BOGONS term TERM010 from address 127.0.0.0/8
set firewall family inet filter BOGONS term TERM010 from address 169.254.0.0/16
set firewall family inet filter BOGONS term TERM010 from address 172.16.0.0/12
set firewall family inet filter BOGONS term TERM010 from address 192.0.2.0/24
set firewall family inet filter BOGONS term TERM010 from address 192.88.99.0/24
set firewall family inet filter BOGONS term TERM010 from address 192.168.0.0/16
set firewall family inet filter BOGONS term TERM010 from address 198.18.0.0/15
set firewall family inet filter BOGONS term TERM010 from address 198.51.100.0/24
set firewall family inet filter BOGONS term TERM010 from address 203.0.113.0/24
set firewall family inet filter BOGONS term TERM010 from address 224.0.0.0/4
set firewall family inet filter BOGONS term TERM010 from address 240.0.0.0/4
set firewall family inet filter BOGONS term TERM010 then discard
set firewall family inet filter BOGONS term PERMIT-IP-ANY-ANY then accept 
set interfaces ge-0/0/2.0 family inet filter input BOGONS 
set interfaces ge-0/0/2.0 family inet filter output BOGONS  

Run the test.



I was very close, but as the test output shows, after the BGP changes made, the neighborhood 192.168.30.0 - 192.168.30.1 is not in the Established state -> as a result, the IP connectivity between points 135.65.0.1 <-> 140.0.0.1 is lost. What is wrong? We look carefully at the HKI-CORE-01 configuration and see that eBGP peering is installed on private addresses:


showroute@HKI-CORE-01# show interfaces ge-0/0/2 | display set             set interfaces ge-0/0/2 description SPB-CORE-01
set interfaces ge-0/0/2 unit 0 family inet filter input BOGONS
set interfaces ge-0/0/2 unit 0 family inet filter output BOGONS
set interfaces ge-0/0/2 unit 0 family inet address 192.168.30.0/31

Conclusion: it is necessary to change the addresses at the junction or add the 192.168.30.0/31 subnet to the exception.

I will add a network at the junction to the exception, I will update /tmp/habr/configs/HKI-CORE-01.cfg again:

set firewall family inet filter BOGONS term TERM005 from address 192.168.0.0/31 
set firewall family inet filter BOGONS term TERM005 then accept  

Run the test.



Now unwanted traffic will not go through the ebgp interface AS 41214 - AS 10631. You can safely make changes without fear of consequences.

Scenario N2




Here I need to terminate the network 150.0.0.0/24 on the MSK-CORE-01 router and ensure connectivity between points 135.65.0.1 and 150.0.0.1. I add the

following lines to the test configuration of the MSK-CORE-01 router - tmp / habr / configs / MSK- CORE-01.cfg:


interface Loopback2
 ip address 150.0.0.1 255.255.255.255
!
ip route 150.0.0.0 255.255.255.0 Null0
!
router bgp 10631
 !
 address-family ipv4
  network 150.0.0.0 mask 255.255.255.0
!

I change the test script and run the test:


git diff HEAD~
diff --git a/batfish-robot.robot b/batfish-robot.robot
index 8d963c5..ce8cb6a 100644
--- a/batfish-robot.robot
+++ b/batfish-robot.robot
@@ -5,7 +5,7 @@ Library  LibraryBatfish.py  tmp/habr
 ${ISIS-ENABLED-LINK-DESCRIPTION}  ISIS-LINK
 ${NODE}  HKI-CORE-01
 ${PROTOCOL}  ebgp
-${RIB-SIZE}  1
+${RIB-SIZE}  2
 *** Test Cases ***
 ISIS
@@ -27,3 +27,8 @@ Ping
     [Documentation]  Test end-to-end ICMP connectivity & show traceroute
     ${result}=  Ping  135.65.0.1  140.0.0.1
     Should Be Equal As Integers  ${result}  1
+
+Ping2
+    [Documentation]  Test end-to-end ICMP connectivity & show traceroute
+    ${result}=  Ping  135.65.0.1  150.0.0.1
+    Should Be Equal As Integers  ${result}  1

Now I expect to see two eBGP routes on the HKI-CORE-01 router, an additional connectivity check has been added.



There is no connection between 135.65.0.1 and 150.0.0.1, moreover, on the HKI-CORE-01 router there is only one eBGP route, instead of two.

Check the contents of the RIB on the HKI-CORE-01 when adding a new configuration to the MSK-CORE-01 router :


showroute@HKI-CORE-01# run show route table inet.0 protocol bgp
inet.0: 20 destinations, 20 routes (19 active, 0 holddown, 1 hidden)
+ = Active Route, - = Last Active, * = Both
135.65.0.0/19      *[BGP/170] 02:25:38, MED 0, localpref 100, from 172.20.20.1
                      AS path: I, validation-state: unverified
                    > to 10.0.0.4 via ge-0/0/0.0
                      to 10.0.0.6 via ge-0/0/1.0
140.0.0.0/24       *[BGP/170] 01:38:02, localpref 100
                      AS path: 10631 I, validation-state: unverified
                    > to 192.168.30.1 via ge-0/0/2.0
showroute@HKI-CORE-01# run show route table inet.0 protocol bgp hidden detail
inet.0: 20 destinations, 20 routes (19 active, 0 holddown, 1 hidden)
150.0.0.0/24 (1 entry, 0 announced)
         BGP                 /-101
                Next hop type: Router, Next hop index: 563
                Address: 0x940f43c
                Next-hop reference count: 4
                Source: 192.168.30.1
                Next hop: 192.168.30.1 via ge-0/0/2.0, selected
                Session Id: 0x9
                State: <Hidden Ext>
                Local AS: 41214 Peer AS: 10631
                Age: 1:42:03
                Validation State: unverified
                Task: BGP_10631.192.168.30.1+179
                AS path: 10631 I
                Localpref: 100
                Router ID: 10.68.1.1
                Hidden reason: rejected by import policy

Note the import policy for prefixes received from SPB-CORE-01 :

set protocols bgp group AS10631 import FROM-AS10631
set protocols bgp group AS10631 neighbor 192.168.30.1 description SPB-CORE-01
set protocols bgp group AS10631 neighbor 192.168.30.1 peer-as 10631
set policy-options policy-statement FROM-AS10631 term TERM010 from route-filter 140.0.0.0/24 exact
set policy-options policy-statement FROM-AS10631 term TERM010 then accept
set policy-options policy-statement FROM-AS10631 term DENY then reject

Lacking a rule allowing 150.0.0.0/24. Add it to the test configuration and run the test:


showroute@HKI-CORE-01# show | compare
[edit policy-options policy-statement FROM-AS10631 term TERM010 from]
       route-filter 140.0.0.0/24 exact { ... }
+      route-filter 150.0.0.0/24 exact;
[edit]



Great, there is connectivity between the networks, all the tests are passed! So you can make these changes to the work of the "combat" network.

Conclusion


In my opinion, Batfish is a powerful tool with great potential. Try it and see for yourself.

If this topic is interesting to you - join the slack chat, Batfish developers will be happy to answer any questions and quickly fix bugs.

batfish-org.slack.com

Thank you for your attention.

References


www.batfish.org

www.youtube.com/channel/UCA-OUW_3IOt9U_s60KvmJYA

github.com/batfish/batfish

media.readthedocs.org/pdf/pybatfish/latest/pybatfish.pdf

github.com/showroute/batfish-habr

Also popular now: