How to take control of your network infrastructure. Chapter Four Automation. Templates
This article is the sixth in a series of articles entitled “How to Take Network Infrastructure Under Your Control”. The contents of all articles in the series and links can be found here .
Leaving a few topics behind, I decided to start a new chapter.
I'll be back to safety a bit later. Here I want to discuss one simple but effective approach, which, I am sure, in one form or another, can be useful to many. It is rather a short story about how automation can change the life of an engineer. It will be about the use of temlates. At the end is a list of my projects, where you can see how everything described here works.
Creating a configuration with a script, using GIT to control changes in IT infrastructure, remote "fill" - these ideas come first when you think about the technical implementation of the DevOps approach. The pluses are obvious. But there are, unfortunately, also disadvantages.
When more than 5 years ago, our developers came to us, to networkers, with these offers, we were not enthusiastic.
I must say that we inherited a rather motley network consisting of the equipment of about 10 different vendors. Something was convenient to configure through our favorite cli, but somewhere we preferred to use the GUI. In addition, long work on "live" equipment was taught to real-time control. For example, when making changes, I feel much more comfortable working directly through cli. So I can quickly see that something went wrong and “roll back” the changes. All this was in some contradiction with their ideas.
Other questions arise, for example, from version to software version the interface may vary slightly. This, in the end, will cause your script to create the wrong “config”. I would not want to use production for a break-in.
Or, how to understand that the configuration commands were applied correctly and what to do in case of an error?
I do not want to say that all these issues are unsolvable. Just saying “A”, it’s probably wise to say “B”, and if you want to use the same processes for change control as in development, you need to have dev and staging environments in addition to production. Then this approach seems complete. But how much will it cost?
But there is one situation where the cons are almost leveled, and only the pros remain. I am talking about design work.
For the past two years, I have been participating in a project to build a data center for one major provider. I am responsible for the F5 and Palo Alto in this project. From the point of view of Cisco it is the "3rd party equipment".
For me personally, there are two distinct stages in this project.
The first year I was endlessly busy, I worked at night and on weekends. I could not raise my head. Pressure from management and the customer was strong and continuous. In a constant routine, I could not even try to optimize the process. It was not only and not so much the configuration of equipment as the preparation of design documentation.
So the first tests began, and I would be amazed how many minor errors and inaccuracies were made. Of course, everything worked, but the letter in the name was missing, the line in the team was missing here ... The tests continued and continued, and I was already in a constant, daily struggle with errors, tests and documentation.
This went on for a year. The project, as I understand it, was not easy for everyone, but gradually the client became more and more satisfied, and this made it possible to take on additional engineers who were able to take on part of the routine.
Now it was possible to look around a little.
And that was the beginning of the second stage.
I decided to automate the process.
What I understood from the communication with the developers at that time (and we must pay tribute, we had a strong team) is that the text format, although it seems at first glance something from the world of the DOS operating system, has a number of valuable properties .
For example, a text format will be useful if you want to take full advantage of GIT and all its derivatives. I wanted to.
Well, it would seem that you can just store a configuration or a list of commands, but making changes is rather inconvenient. In addition, when designing, there is another important task. You must have documentation describing your overall design (Low Level Design) and specific implementation (Network Implementation Plan). And in this case, the use of templates seems to be a very suitable option.
So, when using YAML and Jinja2, a YAML file with configuration parameters, such as IP addresses, BGP AS numbers, ... perfectly fulfills the role of NIP, while Jinja2 templates include syntax appropriate to the design, that is, in fact, it is a reflection of LLD.
It took two days to learn the languages YAML and Jinja2. A few good examples are enough to understand how this works. Then it took about two weeks to create all the templates that fit our design: a week for Palo Alto and another week for F5. All this was posted on corporate githab.
Now the change process was as follows:
It is clear that at first a lot of time was spent on editing, but after a week or two it was already more likely a rarity.
A good test and the ability to debug everything was the client’s desire to change naming convention. Who worked with F5 understands the piquancy of the situation. But for me it was pretty simple. I changed the names in the YAML file, deleted the entire configuration from the equipment, generated a new one and uploaded it. Everything, taking into account bug fixes, took 4 days: two days for each technology. After that, I was ready for the next step, namely the creation of DEV and Staging data centers.
Staging actually repeats production completely. Dev is a heavily stripped down copy built primarily on virtual hardware. Ideal situation for a new approach. If I isolate the time I spent from the general process, then the work, I think, took no more than 2 weeks. The main time is the waiting time of the other side, and a joint search for problems. The implementation of the 3rd party was almost invisible to others. There was even time to teach something and write a couple of articles on Habré :)
So what do I have in the bottom line?
All this led to the fact that
I said a few examples are enough to understand how this works.
Here is a brief (and of course modified) version of what was created in the process of my work.
PAY = deployment P alo A lto from Y aml = Palo Alto from Yaml
F5Y = deployment F5 from Y aml = F5 from Y aml (coming soon)
ACY = deployment AC i from Y aml = F5 from Y aml
I will add a few words about ACY ( not to be confused with ACI).
Those who worked with ACI know that this miracle (and in a good way, too) was created not by networkers :). Forget everything you knew about the network - it won’t come in handy for you!
It’s a little exaggerated, but it approximately conveys the feeling that I constantly experience for 3 years now, working with ACI.
And in this case, ACY is not only an opportunity to build a change control process (which is especially important in the case of ACI, because it is assumed that this is the central and most critical part of your data center), but also gives you a friendly interface for creating a configuration.
The engineers in this project use Excel to use ACI instead of YAML for exactly the same purpose. There are some advantages to using Excel, of course:
But there is one drawback, and in my opinion it outweighs the pros. Controlling changes and coordinating team work becomes much more difficult.
ACY is actually applying the same approaches that I used for 3rd party to configure ACI.
Leaving a few topics behind, I decided to start a new chapter.
I'll be back to safety a bit later. Here I want to discuss one simple but effective approach, which, I am sure, in one form or another, can be useful to many. It is rather a short story about how automation can change the life of an engineer. It will be about the use of temlates. At the end is a list of my projects, where you can see how everything described here works.
DevOps for the web
Creating a configuration with a script, using GIT to control changes in IT infrastructure, remote "fill" - these ideas come first when you think about the technical implementation of the DevOps approach. The pluses are obvious. But there are, unfortunately, also disadvantages.
When more than 5 years ago, our developers came to us, to networkers, with these offers, we were not enthusiastic.
I must say that we inherited a rather motley network consisting of the equipment of about 10 different vendors. Something was convenient to configure through our favorite cli, but somewhere we preferred to use the GUI. In addition, long work on "live" equipment was taught to real-time control. For example, when making changes, I feel much more comfortable working directly through cli. So I can quickly see that something went wrong and “roll back” the changes. All this was in some contradiction with their ideas.
Other questions arise, for example, from version to software version the interface may vary slightly. This, in the end, will cause your script to create the wrong “config”. I would not want to use production for a break-in.
Or, how to understand that the configuration commands were applied correctly and what to do in case of an error?
I do not want to say that all these issues are unsolvable. Just saying “A”, it’s probably wise to say “B”, and if you want to use the same processes for change control as in development, you need to have dev and staging environments in addition to production. Then this approach seems complete. But how much will it cost?
But there is one situation where the cons are almost leveled, and only the pros remain. I am talking about design work.
Project
For the past two years, I have been participating in a project to build a data center for one major provider. I am responsible for the F5 and Palo Alto in this project. From the point of view of Cisco it is the "3rd party equipment".
For me personally, there are two distinct stages in this project.
First stage
The first year I was endlessly busy, I worked at night and on weekends. I could not raise my head. Pressure from management and the customer was strong and continuous. In a constant routine, I could not even try to optimize the process. It was not only and not so much the configuration of equipment as the preparation of design documentation.
So the first tests began, and I would be amazed how many minor errors and inaccuracies were made. Of course, everything worked, but the letter in the name was missing, the line in the team was missing here ... The tests continued and continued, and I was already in a constant, daily struggle with errors, tests and documentation.
This went on for a year. The project, as I understand it, was not easy for everyone, but gradually the client became more and more satisfied, and this made it possible to take on additional engineers who were able to take on part of the routine.
Now it was possible to look around a little.
And that was the beginning of the second stage.
Second stage
I decided to automate the process.
What I understood from the communication with the developers at that time (and we must pay tribute, we had a strong team) is that the text format, although it seems at first glance something from the world of the DOS operating system, has a number of valuable properties .
For example, a text format will be useful if you want to take full advantage of GIT and all its derivatives. I wanted to.
Well, it would seem that you can just store a configuration or a list of commands, but making changes is rather inconvenient. In addition, when designing, there is another important task. You must have documentation describing your overall design (Low Level Design) and specific implementation (Network Implementation Plan). And in this case, the use of templates seems to be a very suitable option.
So, when using YAML and Jinja2, a YAML file with configuration parameters, such as IP addresses, BGP AS numbers, ... perfectly fulfills the role of NIP, while Jinja2 templates include syntax appropriate to the design, that is, in fact, it is a reflection of LLD.
It took two days to learn the languages YAML and Jinja2. A few good examples are enough to understand how this works. Then it took about two weeks to create all the templates that fit our design: a week for Palo Alto and another week for F5. All this was posted on corporate githab.
Now the change process was as follows:
- changed yaml file
- created a configuration file using a template (Jinja2)
- saved to a remote repository
- uploaded the created configuration to the equipment
- saw a mistake
- changed YAML file or Jinja2 template
- created a configuration file using a template (Jinja2)
- ...
It is clear that at first a lot of time was spent on editing, but after a week or two it was already more likely a rarity.
A good test and the ability to debug everything was the client’s desire to change naming convention. Who worked with F5 understands the piquancy of the situation. But for me it was pretty simple. I changed the names in the YAML file, deleted the entire configuration from the equipment, generated a new one and uploaded it. Everything, taking into account bug fixes, took 4 days: two days for each technology. After that, I was ready for the next step, namely the creation of DEV and Staging data centers.
Dev and Staging
Staging actually repeats production completely. Dev is a heavily stripped down copy built primarily on virtual hardware. Ideal situation for a new approach. If I isolate the time I spent from the general process, then the work, I think, took no more than 2 weeks. The main time is the waiting time of the other side, and a joint search for problems. The implementation of the 3rd party was almost invisible to others. There was even time to teach something and write a couple of articles on Habré :)
Summarize
So what do I have in the bottom line?
- all that is required for me to change the configuration is to modify a simple, clearly structured YAML file with configuration parameters. I never change the python script and very rarely (only if there is an error) I change Jinja2
- from the point of view of documentation, an almost ideal situation is obtained. You change the documentation (YAML files act as NIP) and upload this configuration to the equipment. Thus, your documentation is always up to date.
All this led to the fact that
- the percentage of errors decreased to almost 0
- it took 90 percent of the routine
- the speed of implementation has increased significantly
PAY, F5Y, ACY
I said a few examples are enough to understand how this works.
Here is a brief (and of course modified) version of what was created in the process of my work.
PAY = deployment P alo A lto from Y aml = Palo Alto from Yaml
F5Y = deployment F5 from Y aml = F5 from Y aml (coming soon)
ACY = deployment AC i from Y aml = F5 from Y aml
I will add a few words about ACY ( not to be confused with ACI).
Those who worked with ACI know that this miracle (and in a good way, too) was created not by networkers :). Forget everything you knew about the network - it won’t come in handy for you!
It’s a little exaggerated, but it approximately conveys the feeling that I constantly experience for 3 years now, working with ACI.
And in this case, ACY is not only an opportunity to build a change control process (which is especially important in the case of ACI, because it is assumed that this is the central and most critical part of your data center), but also gives you a friendly interface for creating a configuration.
The engineers in this project use Excel to use ACI instead of YAML for exactly the same purpose. There are some advantages to using Excel, of course:
- your nip in one file
- beautiful signs that the client is pleased to look at
- you can use some excel tools
But there is one drawback, and in my opinion it outweighs the pros. Controlling changes and coordinating team work becomes much more difficult.
ACY is actually applying the same approaches that I used for 3rd party to configure ACI.