Balancing traffic in the operator’s IP networks

Immediately I warn you that if you want to read about the modern architecture of solutions, it is better to start from the end of the article.
If it is interesting to read about the difficulties encountered in designing a part of the carrier network, welcome to Cat.

The article describes how to organize balancing of traffic at the network boundary under the following conditions:

  • transport protocol: IPv4;
  • Dynamic Routing Protocol OSPFv2 [ 1 , 2 ];
  • Outgoing and incoming traffic of the same user IP address passes through the same service gateway and through the same NAT router [ 3 ];
  • traffic balancing is carried out between 2 service gateways (BNG [ 4 ]);
  • traffic balancing is performed between 2 NAT routers that do not use dynamic routing;

The connected user network segment is considered on the example of IEEE 802.11 wireless networks [ 5 ] using controllers.

Solved problems:

  • balancing traffic at the point where the user devices connect to the network;
  • uniform distribution of user traffic between BNG;
  • ensuring symmetrical routing of incoming and outgoing traffic using NAT.


The distribution level is a boundary component of the network that performs the following main functions:

  • connection of wireless access controllers;
  • routing and traffic control of wireless access controllers;
  • interfacing with other networks.

Wireless Access Controllers (UKBD) - a group of controllers that perform the following main functions:

  • traffic aggregation point from access points and wireless users;
  • ensuring roaming of wireless users between controllers;
  • access point management.

The level of radio coverage - access points located on sites.

Center for the provision of services (CPU) - provides the connection of controllers to the data network, management and control provided to users, connection to the Internet, IP address translation.

In terms of routing, the IP network is divided into several routing segments: a user segment, an access point segment, and a control segment. This article only covers the custom routing segment.

The proposed solution uses the dynamic routing protocol OSPFv2 [ 1 ] and the Multi-Instance extension [ 2 ]. The main configuration parameters used for OSPF are shown in Figures 1-5.

Using VRF at the distribution level


Using multiple VRFs allows you to assign different primary / backup BNG combinations for user traffic.
For this purpose, at the distribution level, on each of the two L3 switches, user interfaces are defined in different virtual routing tables ( VRF Lite [9]):
One OSPF process is created in each VRF.

Balancing user traffic at a network connection point


Balancing is done by distributing user devices across virtual networks (VLANs). For this purpose, on wireless access controllers, access points are divided into groups (up to 10-15 access points per group). Each group should be assigned a VLAN ID and a subnet of IP addresses of users with a capacity of at least 2-4 class C networks (up to 25 active connections per access point and additional capacity to account for inactive user connections and the specifics of using DHCP: "lease time" [ 6 ]).
On L3 switchboards of distribution level to which controllers are connected, the used IP networks are divided into two large groups. This is necessary to further summarize the routing information and balancing the traffic between BNGs at the 3rd level of the OSI model.
Each group is defined in one of the distribution level switch VRFs.
On L3, reservation is made using the OSPF protocol, as shown in the figure.
picture 1



The choice of the NSSA zone type is determined by the following factors:
- Reduces the number of routes in the NSSA by summing up the routing information about the wireless user networks on the ASBR.
- Provides the ability to set the AD value (Administrative Distance) for "external" OSPF routes on the ABR.
- Provides the ability to simply extract and summarize the routing information of the redistributed routes to ABR.
- Provides the ability to set ABR as a source of routing information when sending LSA to area 0 [ suppress-fa 14 ]. This makes it possible not to send information about the IP addressing structure and sources of external routes from the NSSA zone to Area 0.
- Allows you to dispense with sending two default routes within the NSSA zone [no-summary 14 ]. Traffic balancing between ABR is performed by setting the values ​​of the cost of the channels between the ASBR and ABR within the NSSA zone.
- Allows you to select 2 types of external routes to filter and control the routing of the user routing segment on the ABR.

This article does not disclose the ability to connect custom routing segments to BNG via the MPLS network, but the choice of some of the solutions used is determined by the requirements to work in this mode ([15] sham-link backdoor routing ).

Figure 2 shows examples of using VRF at the distribution level:
- WUsers1 — for users using the SG-01 CPU as the main services gateway and the SG-02 CPU as the backup services gateway;
- WUsers2 — for users using the SG-02 CPU as the main services gateway and the SG-01 CPU as the backup services gateway.

Figure 2


The choice of a pair of primary / backup service gateways in VRF WUsers1 and WUsers2 is implemented through dynamic routing and assignment of different costs to virtual communication channels.

Load balancing at the distribution level


The IP networks assigned to the virtual channels (VLANs) of users, within each of the L3 switches of the distribution level, are identified by two VRFs. Thus, wireless users, depending on which AP group they were included by the wireless access controllers, fall into different VRFs and use different pairs of primary / backup service gateways, ensuring load distribution between service gateways.

In the event of a failure of one of the distribution level switches, all users will be switched to the remaining switch in operation, reconnecting to the wireless network and obtaining an IP address from the new IP network. The IP networks of switched wireless users are also distributed across two VRFs. Thus, the load distribution between BNGs is preserved, regardless of the fact that at some point in time only one L3 switch of the distribution level functions.

The reservation of service gateway connection can be organized using duplicated virtual communication channels located on different IP subnets and terminated on different physical ports of the service gateway.
The physical scheme and network topology, as well as the corresponding organization of logical communication channels, are not presented in this article. The solutions used at these levels also provide for the organization of reserved physical and logical communication channels.

The routing scheme at the distribution level is shown in Figures 1 and 2.

Traffic routing between distribution level and CPU


The organization of communication between the UR switches and gateway services of the CPU is possible by one of the following methods:

At the 2nd level of the OSI model, without the use of an intermediate “L3-hop”.
Using the intermediate "L3-hop".
The first solution requires more resources (VLAN ID, STP).
When using the second method, the stack of switches, on which 2 VRFs are created, can act as intermediate routers.

This solution allows to significantly reduce the number of virtual channels (VLANs) necessary for the organization of communication between the gateways of the CPU and UR services.

A diagram of the organization of communication between the UR routers and the CPU services gateways is shown in Figure 3
Figure 3

Equal OSPF metric assigned to parallel virtual communication channels allows for the distribution of wireless user traffic between virtual communication channels and, as a result, balancing traffic between physical communication lines.

NAT to CPU


NAT routers translate (translate) private IP addresses (Network Address Translation, NAT) to public IP addresses. To implement the IP address translation mechanism, it is necessary to allocate a range (pool) of unique public IP addresses. For a pair of routers, the corresponding NAT groups are formed, in each of which one router is selected as the main (active) and the other as a backup. In case of failure of the main router, the backup becomes active, continuing to serve user sessions.

Routing between service gateways and NAT routers


When using NAT routers, the following limitations are taken into account:

  • NAT routers use only static routes;
  • two virtual networks (VLANs) are allocated to each NAT group: inside VLAN and outside VLAN;

Inside VLAN is used to communicate with CPU service gateways. Outside VLAN is used to communicate with edge BGP routers.

In order to increase fault tolerance, two physical interfaces are used to connect each BNG. Due to various features of the equipment, as well as the need to tightly bind the pool of external IP addresses to a specific BNG, it is proposed to use the following restrictions:
- Do not use Etherchannel technology, but organize load balancing and redundancy using L3 routing;
- For each NAT router, use one physical channel to communicate with BNG.

Thus, there is a need to organize an intermediate “L3-node” (hereinafter referred to as the ASBR CPU) between the BNG and the NAT routers. The intermediate node will perform the following functions:
- OSPF ASBR for area 0.
- Distribution of default routes for area 0.
- Routing of packets coming from NAT routers to OSPF ABR.
- Static routing of packets coming from OSPF area 0 to NAT routers (default gateways).

The role of an intermediate router can be performed by the L3 switch stack, which provides BNG connection and NAT routers, on which 2 VRF (VRF Lite [9]) are created for this purpose: Users1_out and Users2_out.

It is important to use the stack of L3 switches, since this allows:
- use both BNG physical connections to organize virtual communication channels with each of the NAT routers;
- to provide load balancing between the physical interfaces of the BNG;
- ensure that the BNG connection to the L3 switch stays intact in the event of a failure of one of the L3 switches in the stack or problems with the operation of one of the BNG physical interfaces.

Another feature of the solution is the use of two VRFs on a stack of L3 switches.
This is necessary in order to tightly bind each BNG to a specific ASBR (see Figure 4) and, accordingly, bind the pool of external IP addresses to a specific BNG.
For each of these VRFs (Users1_out and Users2_out), independent OSPF processes are started on the L3 switch stack. Virtual links between BNG and VRF Users1_out and Users2_out of the stack of switches are included in the 0th (backbone) OSPF zone.

For routing between ASBR and NAT routers, static routing is used:
  • in VRF Users1_out — static default route through the virtual IP address NAT-group1;
  • in VRF Users2_out — static default route through the virtual IP address NAT-group2;
  • for the first NAT router, static routes to the IP networks of wireless users via the VRF Users1_out IP address;
  • for the second NAT router, static routes to the wireless users' IP networks via the VRF Users2_out IP address.


For the default route distribution in ASBR VRF Users1_out and ASBR Users2_out OSPF processes, the default information originate is enabled.

The scheme using the intermediate “L3-node” is shown in Figure 4.
Figure 4


Equal OSPF metric assigned to parallel virtual communication channels allows for the distribution of wireless user traffic between virtual communication channels and, as a result, balancing traffic between the physical communication lines through which the CPU service gateways are connected to the switch stack.

The ASBR CPU is the border router for the OSPF protocol and is used to redistribute routes from other routing segments, NAT IP address pools, and the Internet.

Routing and balancing traffic between the ASBR CPU and NAT routers


Virtual communication channels are created between the ASBR CPU and NAT routers as shown in Figure 5. The default gateway resiliency on NAT routers can be implemented using the Hot Standby Router Protocol [11] mechanism of HSRP.

On the interfaces of NAT routers, two HSRP groups are used. The first HSRP group is responsible for the default gateway for the NAT-group1, the second HSRP group is responsible for the default gateway for the NAT-group2, as shown in Figure 5.
Figure 5



Routing between NAT routers and network border routers


In the proposed solution, the routing was performed using static routing and HSRP on the network border routers (outside-router, see Figure 6). This decision is not considered in detail in this article.
Figure 6



Virtual links are created between NAT routers and border routers. Failover of the default gateways on the border routers can be implemented using HSRP or similar, depending on the capabilities of the equipment used. Two HSRP groups are used for this purpose.
The routing scheme is shown in Figure 6.

Schemes and drawings
Рисунок 1. VRF беспроводных пользователей на уровне распределения, суммирование маршрутов на IP-подсети пользователей.


Рисунок 2. Маршрутизация на уровне распределения.


Рисунок 3. Маршрутизация между уровнем распределения и ЦПУ.


Рисунок 4. Маршрутизация между BNG и ЦПУ ASBR.


Рисунок 5. Маршрутизация между ЦПУ ASBR и NAT-маршрутизаторами.


Рисунок 6. Маршрутизация на границе сети.


Sources
[1] J. Moy (Ascend Communications), Request for Comments: 2328 “OSPF Version 2”, April 1998.
[2] A. Lindem (Ericsson), A. Roy, S. Mirtorabi (Cisco Systems) Request for Comments: 6549, OSPFv2 Multi-Instance Extensions, March 2012
[3] S. Wadhwa (Alcatel-Lucent), J. Moisand (Juniper Networks), T. Haag (Deutsche Telekom), N. Voigt (Nokia Siemens Networks), T. Taylor, Ed. (Huawei Technologies) Request for Comments: 6320, Protocol for Access Node Control Mechanism in Broadband Networks, October 2011
[4] P. Srisuresh (Jasmine Networks), K. Egevang (Intel Corporation) Request for Comments: 3022, Traditional NAT, January 2001
[5] IEEE 802.11, «Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications», 1997.
[6] R. Droms (Bucknell University) Request for Comments 2131, Dynamic Host Configuration Protocol, March 1997
[7] AP Group VLANs with Wireless LAN Controllers Configuration Example, www.cisco.com, 2008
[8] L. Andersson, T. Madsen (Acreo AB) Request for Comments 4026, Provider Provisioned Virtual Private Network (VPN) Terminology
[9] Configuring VRF-lite, Cisco web site [Online]. Available: www.cisco.com
[10] Y. Rekhter T.J. Watson Research Center, IBM Corp. T. Li, cisco Systems Editors, Request for Comments 1518, An Architecture for IP Address Allocation with CIDR, September 1993
[11] T. Li (Juniper Networks), B. Cole (Juniper Networks) P. Morton (Cisco Systems), D. Li (Cisco Systems), Request for Comments 2281: Cisco Hot Standby Router Protocol (HSRP), March 1998
[12] NAT Examples and Reference, Cisco web site [Online]. Available: www.cisco.com
[13] “Создание беспроводных публичных сетей» 2008-2010гг”, step.ru/projects/industrys/telecom
[14] Cisco IOS IP Routing: OSPF Command Reference
[15] Sham-link backdoor routing




One of the difficulties was to design a solution that included a significant number of nodes, services and related systems with which it was necessary to ensure integration. As well as artists responsible for the design of various systems and services.

Several conclusions based on the experience gained:
- perform end-2-end design of services, including traffic routing;
- divide functional components into separate IP nodes (BNG, NAT routers, BGP border routers);
- stackable routers greatly simplify the projected solution.
- when using virtual p2p-channels, do not forget to configure ospf correctly on the interfaces of routers;)
UPDATE: Added a very detailed description of the solution. I hope it became clearer.
Pictures corrected.

Prepared based on the materials of 2008.
You can find out about the use of modern BNG in networks of telecom operators on the Learning Club website or on the information resources of telecom equipment manufacturers.

Also popular now: