Why network engineers need programming
It is interesting to exchange views and experiences on the use of programming languages in solving problems of a network engineer, if you use some automation methods and approaches, write about this in the comments, and I will talk about some of my best practices in this direction.
Not so long ago, I was a participant in a project to change the IGP interaction scheme, as part of the project, it was necessary to migrate to a live network. Considering the scale of the project, the work was divided into several independent stages, and we developed an excellent, from the point of view of network design, migration plan to seamlessly move to the next stage and roll back the configuration of routers to the previous one if the tests showed different behavior than expected. When planning the configuration, we divided the stages into groups with a complex and simple set of changes, to prepare the configuration of the stages that we assigned to a complex group, we wrote a generator and an environment of relevant initial data, which is updated after the successful completion of the stage and serves as a starting point in preparing the configuration for the next .
At the penultimate stage, when it was only necessary to clear part of the old configuration, we were faced with an unplanned interruption of the service due to a few "children's" syntax errors in the set of executable commands that were written manually for this stage. We quickly fixed the errors, so that we even kept within the allotted window for the work, and, for me, drew a useful conclusion, if you can automate the filling of various configuration blocks using templates, do this as often as possible . In large tasks this reduces the likelihood of error, while in routine operations it saves time. At the exit, you get a unified configuration that is easy to operate, with minimal expense of your time.
In an ideal world, I would not be able to come up with reasons why network engineers benefit from programming experience, but the real world and our ideas about it often do not coincide. Many network engineers may seem familiar with the situation when something on the network should have been configured in a certain way, but, for some reason, or mistakenly configured differently - the STP filter or suppression of broadcast traffic is not registered on the client interface, some sections in the CoPP filter, a mismatch of the BGP type of the prefix assigned to the community, a missed MTU on the backbone interface, and so on. The list of pains can be continued for a very long time, therefore the configuration of services and components of a live network needs to be periodically validated for compliance with the established templates and the required parameter sets. Before validation, you will have to typify the configuration elements, this in itself is a big and useful work. Expand configuration elements into sets, and sets into subsets, assigning each of them the required template and the required set of properties. For example, a switch port may belong to multiple client ports or, in the case of ERPS, be in transit within one or multiple rings. For typing, I usually use the description field, for ports, for example, it can look like this: id / type / speed / phb / mon_s / mon_b / remote ,
If you typify BGP groups, a good start may be content that determines the group's membership in one of the sets: UP - for higher-level operators, DN - for clients, PR - for exchange points or PPR - for private joints, IAS - for Inter-AS joints. Each of these groups, in turn, can be typed by the required set of transmitted or received prefixes.
For operation, it is useful to type also:
As I already wrote, typing should be considered not only as a necessary condition for validation, but also as a separate process for obtaining a formal description of the objects in your network, which will be useful for planning work, assessing risks and determining the level of importance of an event. It’s nice when the NOC change, after detecting an emergency indication on the RSVP path with redundancy at half past three in the morning, assigns the corresponding reaction level to this incident, instead of waking up network engineers of the second support level.
Depending on the tasks, the list and typing rules can vary within wide limits, take this process very seriously so that at the stage of validation of configurations your verification rules are simple and templates are free of redundancy and intersections among themselves. I recommend not to leave blank spots or exceptions to typing rules, in this case it is better to work according to the “all or nothing” rule. If there are exceptions, make rules out of them so that you can break down many objects with individual characteristics into several sets of elements with unified characteristics. For example, if the interfaces with CE on your network can be either Active-Active or Active-Backup, reflect this as separate types.
The writing of validation procedures is very closely related to the syntax for representing the configuration blocks of manufacturers of operating equipment. If the operating system of your equipment allows you to present the configuration in the form of XML or JSON, then there is nothing more to talk about, since it is a pleasure to check the fields of these formats. But even if the configuration blocks of your manufacturer have a slightly less formal look, do not rush to refuse this venture. A good help in this work is the CLI References and Command Guide, which indicate not only the required or optional parameters of the configuration blocks, but also the exact sequence and relationship of the used tokens. For example, the following command for Huawei routers
mpls rsvp-te timer retransmission {increment-value increment | retransmit-value interval} *
matches pattern {x | y | ...} *, which implies one or more optional arguments. I advise you not to rely on memory when you write the validator code, just keep an open page with a description of the necessary command for the current version of the software of your equipment.
For myself, I have identified two approaches to parsing configuration files, in simple cases it is logical to use regular expressions in combination with the dictionary of the manufacturer’s tokens. This is not a very flexible approach with almost zero portability, but such types of checks are implemented quite simply. For a deeper analysis or for tasks of transferring a configuration to equipment of another manufacturer, I propose to consider a configuration file as a tree, the elements of which are connected by a parent-child relationship, in such a data structure it is quite easy to organize searches and selections according to various conditions.
It is useful if the typing procedure will not only report errors, missing or unnecessary commands, but also indicate places of configuration whose type could not be recognized.This will help to get rid of white spots after several work cycles and bring the network equipment configuration to a unified view. Do not rush to eliminate the errors found directly from the validator, as my experience in implementing configuration verification systems shows, you need some time to maintain feedback between the results of the validator and the classification of network objects. Inaccuracies are possible, therefore, until a certain point, I advise you to carefully monitor the results of the work and make appropriate adjustments. I also advise you to pay special attention to the scenarios of automatic editing of the configuration based on the results of the errors found, carefully monitor the sequence of generated commands so that this sequence does not depend on the order of verification procedures, this is especially true in multi-threaded environments. A trivial example of this inconsistency is adding the missing VLAN number to the port before it is declared in the global table. In a good wayafter each command is executed, it is necessary to check the interpreter’s response , this practice minimizes the likelihood of using a half configuration, like a BGP declaration of a neighbor with a missing prefix export policy and the like.
But the network engineer is not busy with a single configuration; sometimes you need to look at the network, as they say, from above. A good help in this matter is the theory of graphs and algorithms for them. Initial knowledge in this area allows you to find closed and open rings in your network, assess the degree of connectivity of clusters or groups of devices, predict partial failure scenarios, determine the boundaries of failure and risk domains, and solve other analytical and creative tasks.
Not so long ago, I was a participant in a project to change the IGP interaction scheme, as part of the project, it was necessary to migrate to a live network. Considering the scale of the project, the work was divided into several independent stages, and we developed an excellent, from the point of view of network design, migration plan to seamlessly move to the next stage and roll back the configuration of routers to the previous one if the tests showed different behavior than expected. When planning the configuration, we divided the stages into groups with a complex and simple set of changes, to prepare the configuration of the stages that we assigned to a complex group, we wrote a generator and an environment of relevant initial data, which is updated after the successful completion of the stage and serves as a starting point in preparing the configuration for the next .
At the penultimate stage, when it was only necessary to clear part of the old configuration, we were faced with an unplanned interruption of the service due to a few "children's" syntax errors in the set of executable commands that were written manually for this stage. We quickly fixed the errors, so that we even kept within the allotted window for the work, and, for me, drew a useful conclusion, if you can automate the filling of various configuration blocks using templates, do this as often as possible . In large tasks this reduces the likelihood of error, while in routine operations it saves time. At the exit, you get a unified configuration that is easy to operate, with minimal expense of your time.
In an ideal world, I would not be able to come up with reasons why network engineers benefit from programming experience, but the real world and our ideas about it often do not coincide. Many network engineers may seem familiar with the situation when something on the network should have been configured in a certain way, but, for some reason, or mistakenly configured differently - the STP filter or suppression of broadcast traffic is not registered on the client interface, some sections in the CoPP filter, a mismatch of the BGP type of the prefix assigned to the community, a missed MTU on the backbone interface, and so on. The list of pains can be continued for a very long time, therefore the configuration of services and components of a live network needs to be periodically validated for compliance with the established templates and the required parameter sets. Before validation, you will have to typify the configuration elements, this in itself is a big and useful work. Expand configuration elements into sets, and sets into subsets, assigning each of them the required template and the required set of properties. For example, a switch port may belong to multiple client ports or, in the case of ERPS, be in transit within one or multiple rings. For typing, I usually use the description field, for ports, for example, it can look like this: id / type / speed / phb / mon_s / mon_b / remote ,
- Id - for example, client1
- type the following values can be accepted: CUST2 - clients with a second-level service, CUST3 - clients with a third-level service, BB - trunk MPLS ports of the third level, AG - MPLS ports to aggregation devices, AC - MPLS ports to access devices, BR - trunk ports second level, TR - transit ports of the third level, etc.
- speed - actual or contracted bands.
- php - type of classification, repainting and processing of QOS PHB.
- mon_s - type of reaction to the emergency indication of a change in NOC or monitoring system.
- mon_b - type of local response of the device to an emergency.
- remote - identifier of the response equipment and port.
If you typify BGP groups, a good start may be content that determines the group's membership in one of the sets: UP - for higher-level operators, DN - for clients, PR - for exchange points or PPR - for private joints, IAS - for Inter-AS joints. Each of these groups, in turn, can be typed by the required set of transmitted or received prefixes.
For operation, it is useful to type also:
- RSVP paths - redundant / no, alarm / data, primary / backup, with or without lane, within or between regional.
- pseudowires - Ethernet / TDM, with or without SLA, wide or narrow cavity, with or without redundancy.
- VPLS or VPN instances - service or client, with or without SLA, wide or narrow cavity.
As I already wrote, typing should be considered not only as a necessary condition for validation, but also as a separate process for obtaining a formal description of the objects in your network, which will be useful for planning work, assessing risks and determining the level of importance of an event. It’s nice when the NOC change, after detecting an emergency indication on the RSVP path with redundancy at half past three in the morning, assigns the corresponding reaction level to this incident, instead of waking up network engineers of the second support level.
Depending on the tasks, the list and typing rules can vary within wide limits, take this process very seriously so that at the stage of validation of configurations your verification rules are simple and templates are free of redundancy and intersections among themselves. I recommend not to leave blank spots or exceptions to typing rules, in this case it is better to work according to the “all or nothing” rule. If there are exceptions, make rules out of them so that you can break down many objects with individual characteristics into several sets of elements with unified characteristics. For example, if the interfaces with CE on your network can be either Active-Active or Active-Backup, reflect this as separate types.
The writing of validation procedures is very closely related to the syntax for representing the configuration blocks of manufacturers of operating equipment. If the operating system of your equipment allows you to present the configuration in the form of XML or JSON, then there is nothing more to talk about, since it is a pleasure to check the fields of these formats. But even if the configuration blocks of your manufacturer have a slightly less formal look, do not rush to refuse this venture. A good help in this work is the CLI References and Command Guide, which indicate not only the required or optional parameters of the configuration blocks, but also the exact sequence and relationship of the used tokens. For example, the following command for Huawei routers
mpls rsvp-te timer retransmission {increment-value increment | retransmit-value interval} *
matches pattern {x | y | ...} *, which implies one or more optional arguments. I advise you not to rely on memory when you write the validator code, just keep an open page with a description of the necessary command for the current version of the software of your equipment.
For myself, I have identified two approaches to parsing configuration files, in simple cases it is logical to use regular expressions in combination with the dictionary of the manufacturer’s tokens. This is not a very flexible approach with almost zero portability, but such types of checks are implemented quite simply. For a deeper analysis or for tasks of transferring a configuration to equipment of another manufacturer, I propose to consider a configuration file as a tree, the elements of which are connected by a parent-child relationship, in such a data structure it is quite easy to organize searches and selections according to various conditions.
It is useful if the typing procedure will not only report errors, missing or unnecessary commands, but also indicate places of configuration whose type could not be recognized.This will help to get rid of white spots after several work cycles and bring the network equipment configuration to a unified view. Do not rush to eliminate the errors found directly from the validator, as my experience in implementing configuration verification systems shows, you need some time to maintain feedback between the results of the validator and the classification of network objects. Inaccuracies are possible, therefore, until a certain point, I advise you to carefully monitor the results of the work and make appropriate adjustments. I also advise you to pay special attention to the scenarios of automatic editing of the configuration based on the results of the errors found, carefully monitor the sequence of generated commands so that this sequence does not depend on the order of verification procedures, this is especially true in multi-threaded environments. A trivial example of this inconsistency is adding the missing VLAN number to the port before it is declared in the global table. In a good wayafter each command is executed, it is necessary to check the interpreter’s response , this practice minimizes the likelihood of using a half configuration, like a BGP declaration of a neighbor with a missing prefix export policy and the like.
But the network engineer is not busy with a single configuration; sometimes you need to look at the network, as they say, from above. A good help in this matter is the theory of graphs and algorithms for them. Initial knowledge in this area allows you to find closed and open rings in your network, assess the degree of connectivity of clusters or groups of devices, predict partial failure scenarios, determine the boundaries of failure and risk domains, and solve other analytical and creative tasks.