Creating a cloud network: not so simple
Translator’s note: Ruxit network monitoring service engineer Alois Mayr wrote interesting material about the difficulties that newcomers may encounter when setting up a network in the cloud, and we prepared an adapted translation for it.
If your applications run on AWS Amazon Web Services] or one of such cloud platforms (such as 1cloud ), which means that you, among other things, have successfully “transferred” work from your network to cloud services. Naturally, this can be very valuable for you, primarily because you do not need to maintain the physical infrastructure of the network. However, the lack of direct access to the network does not mean that it does not need to be monitored at all.
A bit of history
In traditional types of application architecture, the network infrastructure was strictly controlled by a team of specialists. Such teams were responsible for replacing overloaded equipment before any problems occurred, identifying and resolving weak links in the network, resolving performance limitation problems, monitoring latency parameters during data transfer, and even detecting network security threats. In other words, regular networking teams followed all seven layers of the OSI model.
Modern architectures need networking more than ever
In architectures using cloud technologies, the situation is different, and now the use of networks plays a much greater role. Imagine a typical architecture using cloud services. You manage a data center with a variable number of computers allocated for operation (depending, for example, on the pricing mechanism and changing processor requirements). Your data center serves distributed applications that are developed, for example, based on microservices. In addition, your applications are distributed using, say, Docker containers , which gives your team a DevOps methodology . Development operations], some freedom of action. In such situations, the availability of networks is necessary as never before. Your network must take care of all the communications necessary for microservice communications. It serves as a virtual “nervous system” for your applications.
Even though system administrators do not have direct access to the network, nevertheless, it works and requires attention. It is often difficult to find out where your servers are physically located or how they are connected to other nodes on your network. Interconnected virtual machines and services can even reside on the same virtual host; in this case, your network can only read data from memory. This means that quite often the physical network is connected to several virtual networks.
Difficulties in working with networks using cloud services
Due to the lack of direct access to the network (levels 1-2 of the OSI model), teams adhering to the principles of DevOps are quite difficult to monitor. They can use the monitoring tools offered by cloud providers, such as Cloud Watch, which read network performance indicators such as NetworkIn and NetworkOut, but these indicators may not be enough to identify network problems.
Listed below are some of the main difficulties encountered when using the DevOps methodology to maintain virtual network performance:
- Distribution of network resources between competing processes (for example, a problem known as TCP Incast );
- Changing network infrastructure in the presence of new or suspended objects;
- Network scalability using ENI network interface [ Eng. Elastic Network Interfaces ];
- The quality of the internal connections of the data center;
- The quality of connections to private networks located outside the data center.
Network usage monitoring
When monitoring a network, you must be able to adapt to the changes in infrastructure mentioned above. In particular, you need to be able to work with virtual network interfaces. In this regard, monitoring must be carried out on their hosts, and at the same time constantly monitor changes in the virtual infrastructure. In this case, you can monitor network connections between processes associated with other processes and services, thus monitoring the actual use of the network, not just network devices.
Resource monitoring is very important ... and easy!
With this approach to monitoring, your network will not be regarded only as a combination of network interfaces, routing tables, and security groups. Instead, your network will be seen as a limited resource used by processes and applications. From the point of view of processes, this resource can be monitored along with the central processor, operational and external memory, and can even be quantified. In addition, the performance of a full-stack application is monitored and it is possible to detect network problems down to the application level.
Below are a few key network performance metrics to consider:
- The main indicator of network performance is data traffic on the network (bandwidth).
- The network connectivity metric measures the percentage of successfully completed TCP connections and indicates service availability. A TCP connection can be interrupted or terminated with timeouts, so a lack of connection is a clear sign of problems between the sender and receiver on the network.
- When determining the quality of established TCP connections, you should also pay attention to the frequency of data retransmission. The purpose of the TCP protocol is reliable connection and error detection during data transfer. This means that the recipient must agree to receive data packets sent over the network link; otherwise they are considered lost and then resent by the sender. Thus, the frequency of data retransmission indicates the presence of weak links in the network and the congestion of its infrastructure.
Virtualized networks do not lend themselves to more or less traditional management methods. You need to monitor them, at least from the point of view of your hosts and processes, so that you have effective indicators of network performance.