How to take control of network infrastructure. Chapter three Network security Part one

This article is the third in a series of articles "How to take control of the network infrastructure." The content of all articles in the series and links can be found here .

It makes no sense to talk about the complete elimination of security risks. We basically can not reduce them to zero. You also need to understand that in an effort to make the network more and more secure, our solutions are becoming more and more expensive. You need to find a reasonable compromise for your network between price, complexity and security.

Of course, the security design is organically integrated into the overall architecture and the security solutions used affect the scalability, reliability, manageability, ... of the network infrastructure, which should also be taken into account.

But let me remind you that now we are not talking about creating a network. According to ourWe have already chosen the initial conditions , selected equipment, and created the infrastructure, and at this stage, we should, if possible, “live” and find solutions in the context of the previously chosen approach.

Our task now is to identify the risks associated with security at the network level and reduce them to a reasonable value.

Network security audit

If ISO 27k processes are implemented in your organization, then security audits and network changes should be organically incorporated into the overall processes within this approach. But these standards are not about specific solutions, not about configuration, not about design ... There are no definite advice, there are no standards dictating in detail what your network should be, this is the complexity and beauty of this task.

I would single out a few possible network security audits:

equipment configuration audit (hardening)
security design audit
access audit
process audit

Hardware Configuration Audit (hardening)

It seems that in most cases this is the best starting point for auditing and improving the security of your network. IMHO, this is a good demonstration of the Pareto law (20% of the efforts give 80% of the result, and the remaining 80% of the efforts - only 20% of the result).

The bottom line is that we usually have recommendations from vendors regarding the “best practices” on security when configuring equipment. This is called “hardening”.

You can also often find a questionnaire (or create it yourself), based on these recommendations, which will help you determine how your equipment configuration corresponds to these “best practices” and make changes to your network according to the result. This will allow you to quite easily, virtually without costs, significantly reduce security risks.

Some examples for some Cisco operating systems.

Cisco IOS
-XR Cisco IOS Configuration Hardening
Cisco NX-OS Configuration Hardening
Cisco Baseline Security Check List

Based on these documents, a list of configuration requirements for each type of hardware can be created. For example, for a Cisco N7K VDC, these requirements might look like this .

In this way, configuration files can be created for different types of active equipment in your network infrastructure. Further, manually or using automation, you can “fill” these configuration files. How to automate this process will be discussed in detail in another series of articles on orchestration and automation.

Audit security design

Typically, the following segments are present in one form or another in an enterprise network:

DC (Public services DMZ and Intranet data center)
Internet access
Remote access VPN
WAN edge
Branch
Campus (Office)
Core

The names are taken from the Cisco SAFE model, but it is not necessary, of course, to attach to these names and to this model. Still, I want to talk about the essence and not to link in the formalities.

For each of these segments, security requirements, risks and, accordingly, decisions will be different.

Consider each of them separately for problems that you may encounter in terms of security design. Of course, again, I repeat that in no way this article claims to be complete, which is difficult to achieve in this really deep and multifaceted topic (if at all possible), but reflects my personal experience.

There is no perfect solution (at least now). It is always a compromise. But it is important that the decision to apply this or that approach was made consciously, with an understanding of both its advantages and disadvantages.

Data center

The most critical security segment.
And, as usual, there is also no universal solution. It all depends on the network requirements.

Need or not firewall?

It would seem that the answer is obvious, but everything is not quite as clear as it may seem. And not only the price can affect your choice .

Example 1. Delays.

If between some network segments low latency is an essential requirement, which, for example, is true in the case of the exchange, then between these segments we will not be able to use firewalls. It is difficult to find studies on delays in firewalls, but only a few switch models can provide delays of less than or about 1 mksec, so I think that if microseconds are important for you, then firewalls are not for you.

Example 2. Performance.

The bandwidth of top L3 switches is usually an order of magnitude higher than the bandwidth of the most efficient firewalls. Therefore, in the case of high-intensity traffic, you will also likely have to let this traffic bypass firewalls.

Example 3. Reliability.

Firewalls, especially modern NGFW (Next-Generation FW) - complex devices. They are much more complex than L3 / L2 switches. They provide a large number of services and configuration options, so it is not surprising that their reliability is much lower. If service continuity is critical for a network, then you may have to choose what will lead to better availability — firewall security or simplicity of a network built on switches (or various factories) using ordinary ACLs.

In the case of the above examples, you are likely (as usual) to find a compromise. Look towards the following solutions:

if you decide not to use firewalls inside the data center, then you need to consider how to limit access to the perimeter as much as possible. For example, you can open only the necessary ports from the Internet (for client traffic) and administrative access to the data center only from jump hosts. On jump hosts, perform all necessary checks (authentication / authorization, antivirus, logging, ...)
You can use logical partitioning of the data center network into segments, similar to the scheme described in PSEFABRIC example p002 . At the same time, routing must be configured so that traffic that is sensitive to delays or high-intensity traffic goes “inside” one segment (in the case of p002, VRF-a) and does not go through the firewall. Traffic between different segments will still go through the firewall. You can also use route leaking between VRFs to avoid redirecting traffic through the firewall.
You can also use firewall in transparent mode and only for those VLANs where these factors (latency / performance) are not significant. But you need to carefully study the limitations associated with the use of this mode for each vendor.
You might consider using the service chain architecture. This will allow to send only necessary traffic through the firewall. Theoretically it looks beautiful, but I have never seen this solution in production. We tested the service chain for the Cisco ACI / Juniper SRX / F5 LTM about 3 years ago, but at that time this solution seemed “raw” to us

Protection level

Now you need to answer the question, what tools do you want to use to filter traffic. Here are some of the features that are commonly present in NGFW (for example, here ):

stateful firewalling (default)
application firewalling
threat prevention (antivirus, anti-spyware, and vulnerability)
URL filtering
data filtering (content filtering)
file blocking (file types blocking)
dos protection

And also not everything is clear. It would seem that the higher the level of protection, the better. But you also need to consider that

the more of the above firewall functions you use, the more natural it will be (licenses, additional modules)
the use of some algorithms can significantly reduce the throughput of the firewall, as well as increase delays, see for example here
As with any complex solution, the use of complex protection methods can reduce the reliability of your solution, for example, when using application firewalling I have encountered blocking some quite standard applications (dns, smb)

As usual, you need to find the best solution for your network.

It is impossible to unambiguously answer the question what protection functions may be required. First, because of course it depends on the data that you transmit or store and try to protect. Secondly, in reality, often the choice of remedies is a matter of faith and trust to the vendor. You do not know the algorithms, you do not know how effective they are and you cannot fully test them.

Therefore, in critical segments, the use of offers from different companies can be a good solution. For example, you can enable antivirus on the firewall, but also use anti-virus protection (of another manufacturer) locally on the hosts.

Segmentation

This is a logical segmentation of the data center network. For example, partitioning into VLANs and subnets is also a logical segmentation, but we will not consider it because of its obviousness. Interesting is the segmentation taking into account such entities as FW security zones, VRFs (and their analogues for different vendors), logic devices (PA VSYS, Cisco N7K VDC, Cisco ACI Tenant, ...), ...

An example of such a logical segmentation and the currently demanded design of the data center is given in p002 of the PSEFABRIC project .

Having defined the logical parts of your network, you can further describe how traffic flows between different segments, on which devices filtering will be performed and by what means.

If there is no clear logical partitioning in your network and the rules for applying security policies for different data streams (flow) are not formalized, this means that when opening a particular access, you are forced to solve this problem, and with a high probability you will solve it every time differently.

Often, segmentation is based only on FW security zones. Then you need to answer the following questions:

what security zones do you need
what level of protection do you want to apply to each of these zones
will intra-zone traffic be allowed by default
if not, which traffic filtering policies will be applied within each of the zones
what traffic filtering policies will be applied for each pair of zones (source / destination)

TCAM

Often there is the problem of insufficient TCAM (Ternary Content Addressable Memory), both for routing and for access. IMHO, this is one of the most important issues when choosing equipment, so you need to treat this issue with the proper degree of accuracy.

Example 1. Forwarding Table TCAM.

Let's take a look at the Palo Alto 7k firewall.
We see that IPv4 forwarding table size * = 32K
At the same time, this is the number of routes common to all VSYSs.

Suppose that, in accordance with your design, you decide to use 4 VSYS.
Each of these BGP VSYSs is connected to two PE MPLS clouds that you use as BB. Thus, 4 VSYSs exchange all specific routes with each other and have a forwarding table with approximately the same set of routes (but different NHs). Because each VSYS has 2 BGP sessions (with the same settings), then each route received via MPLS has 2 NH and, accordingly, 2 FIB entries in the Forwarding Table. If we assume that this is the only firewall in the data center and it should be aware of all the routes, then this will mean that the total number of routes in our data center cannot be more than 32K / (4 * 2) = 4K.

Now, if we assume that we have 2 data centers (with the same design), and we want to use VLANs “stretched” between data centers (for example, for vMotion), then to solve the routing problem, we must use host routes. But this means that for 2 data centers we will have no more than 4096 possible hosts and, of course, this may not be enough.

Example 2. TCAM ACL.

If you plan to filter traffic on L3 switches (or other solutions that use L3 switches, for example, Cisco ACI), then when choosing equipment you should pay attention to the ACL TCAM.

Suppose you want to control accesses on the Cisco Catalyst 4500 SVI interfaces. Then, as you can see from this article , you can use only 4096 lines of TCAM to control the outgoing (as well as incoming) traffic on the interfaces. That when using TCAM3 will give you about 4,000 thousand ACE (ACL lines).

In case you are faced with the problem of insufficient TCAM, then, first of all, of course, you need to consider the possibility of optimization. So, in case of a problem with the size of the Forwarding Table, you need to consider the possibility of aggregating routes. In the case of a problem with the size of a TCAM for accesses, an audit of accesses, the removal of obsolete and overlapping records, and possibly a review of the procedure for opening accesses (will be discussed in detail in the chapter on access auditing).

High availability

The question is whether to use HA for firewalls or to put two independent boxes “in parallel” and in case one of them drops, route the traffic through the second?

It would seem that the answer is obvious - use HA. The reason why this question still arises is that, unfortunately, the theoretical and advertising 99 and several nines after the comma percent of the availability in practice are far from being so rosy. HA - logically quite a complicated thing, and on different equipment, and with different vendors (there were no exceptions), we caught problems and bugs and stopped the service.

In the case of using HA, you will be able to turn off individual nodes, switch between them without stopping the service, which is important, for example, when upgrading, but you have a far from zero chance that both nodes will break at the same time, and that the upgrade will not go as smoothly as the vendor promises (this problem can be avoided if you have the opportunity to test the upgrade on laboratory equipment).

If you do not use HA, then from the point of view of double breakage, your risks are much lower (since you have 2 independent firewalls), but since Sessions are not synchronized, then every time you switch between these firewalls, you will lose traffic. You can, of course, use stateless firewalling, but then the meaning of using a firewall is largely lost.

Therefore, if as a result of the audit you found lonely firewalls and you are thinking about increasing the reliability of your network, then HA, of course, is one of the recommended solutions, but you should also take into account the disadvantages associated with this approach and, perhaps, for your network more appropriate would be another solution.

Ease of management (managability)

In principle, HA is also about controllability. Instead of configuring the 2 boxes separately and solving the configuration synchronization problem, you control them in many ways as if you had one device.

But maybe you have a lot of data centers and a lot of firewalls, then this question arises at a new level. And the question is not only about configuring, but also about

backup configurations
updates
upgrades
monitoring
logging

And all this can be solved by centralized management systems.

So, for example, if you use Palo Alto firewalls, then Panorama is such a solution.

To be continued.

Tags: