Istio and Kubernetes in production. Part 2. Tracing
In the last article, we looked at the basic components of the Service Mesh Istio, got acquainted with the system and answered the main questions that usually arise at the beginning of working with Istio. In this part we will look at how to organize the collection of tracing information over the network.
The first thing that comes up to many developers and system administrators when they hear the words Service Mesh is tracing. Indeed, we add a special proxy server to each node of the network through which all TCP traffic passes. It seems that now you can easily send information about all network interactions in the network. Unfortunately, in reality there are many nuances that must be considered. Let's look at them.
In fact, relatively free, we can only get the nodes of our system connected by arrows and the rate of data that passes between services (in fact, only the number of bytes per unit of time). However, in most cases, our services communicate over some application-level protocol, such as HTTP, gRPC, Redis, and so on. And, of course, we want to see the tracing information on these protocols, we want to see the rate of requests, and not the rate of data. We want to understand the latency of requests for our protocol. Finally, we want to see the full path that the request passes from logging into our system until the user receives a response. This task is not so easy to solve.
First, let's look at how sending tracing span looks from an architectural point of view in Istio. As we remember from the first part, Istio has a separate component for collecting telemetry, which is called Mixer. However, in the current version 1.0. * Sending is performed directly from proxy servers, namely, from envoy proxy. Envoy proxy supports sending tracing spans via zipkin protocol out of the box. Other protocols can be connected, but only through the plugin. With Istio, we immediately get a compiled and configured envoy proxy, which supports only the zipkin protocol. If we want to use, for example, the Jaeger protocol and send tracing spans via UDP, then we will need to collect our istio-proxy image. There is support for custom plugins for istio-proxy, however it is still in the alpha version. Therefore, if we want to do without a large number of custom settings, the range of technologies used for storing and receiving tracing span is reduced. Of the main systems, in fact, you can now use Zipkin itself, or Jaeger, but send it all over a zipkin compatible protocol (which is much less efficient). The zipkin protocol itself involves sending all the tracing information to the collectors via the HTTP protocol, which is quite expensive.
As I said, treysit we want application layer protocols. This means that proxy servers that are located next to each service must understand exactly what interaction is happening now. By default, Istio configures plain TCP type for all ports, which means that no traces will be sent. In order for the traces to be sent, it is necessary, first, to enable such an option in the main mesh config and, what is very important, to name all ports of service entities kubernetes in accordance with the protocol used in the service. That is, for example, like this:
You can also use compound names, such as http-magic (Istio will see http and recognize this port as http endpoint). The format is proto-extra.
In order not to patch a huge number of configurations for defining a protocol, you can use dirty workaround: patch the Pilot component at the moment when it just performs the logic of defining the protocol . As a result, of course, it will be necessary to change this logic to the standard one and switch to the convention of naming all ports.
In order to understand whether the protocol is defined correctly, you need to go to any of the sidecar containers with envoy proxy and make a request to the admin port of the envoy interface with location / config_dump. In the resulting configuration, you need to look at the desired service field. It is used in Istio as an identifier for where the request is going. In order to customize the value of this parameter in Istio (we will then see it in our tracing system), you need to specify the serviceCluster flag at the launch stage of the sidecar container. For example, it is possible so to calculate the variable derived from the downward API kubernetes:
A good example for understanding how the tracing in the envoy, there is here .
The endpoint itself to send tracing spans must also be specified in the envoy proxy launch flags, for example:
Unfortunately, this is not the case. The complexity of implementation depends on how you have already implemented the interaction of services. Why is that?
The fact is that in order for istio-proxy to understand the correspondence of incoming requests to the service with those leaving the same service, it is not enough just to intercept all traffic. You need to have some kind of connection identifier. The HTTP envoy proxy uses special headers, according to which envoy understands exactly which request to the service generates specific requests to other services. The list of such titles:
If you have a single point, for example, a basic client, in which you can add such logic, then everything is fine, you just need to wait for the update of this library for all clients. But if you have a very heterogeneous system and there is no unification in the hike from services to services over the network, then this will most likely be a big problem. Without the addition of such logic, all tracing information will be only “one-level”. That is, we will get all the inter-service interactions, but they will not be glued together in a single chain of passage through the network.
Istio provides a convenient tool for collecting tracing information over the network, however, we must understand that for implementation it will be necessary to adapt your system and take into account the peculiarities of the implementation of Istio. As a result, two main points need to be solved: the definition of the application layer protocol (which should be supported by the envoy proxy) and the setting up of forwarding information about the relatedness of requests to the service from requests from the service (using headers, in the case of the HTTP protocol). When these issues are resolved, we get a powerful tool that allows you to transparently collect information from the network even in very heterogeneous systems written in many different languages and frameworks.
In the next article about Service Mesh, we will discuss one of the biggest problems of Istio - a large consumption of RAM by each sidecar proxy container and discuss how to deal with it.
The first thing that comes up to many developers and system administrators when they hear the words Service Mesh is tracing. Indeed, we add a special proxy server to each node of the network through which all TCP traffic passes. It seems that now you can easily send information about all network interactions in the network. Unfortunately, in reality there are many nuances that must be considered. Let's look at them.
Misconception number one: we can get free data on trips on the network
In fact, relatively free, we can only get the nodes of our system connected by arrows and the rate of data that passes between services (in fact, only the number of bytes per unit of time). However, in most cases, our services communicate over some application-level protocol, such as HTTP, gRPC, Redis, and so on. And, of course, we want to see the tracing information on these protocols, we want to see the rate of requests, and not the rate of data. We want to understand the latency of requests for our protocol. Finally, we want to see the full path that the request passes from logging into our system until the user receives a response. This task is not so easy to solve.
First, let's look at how sending tracing span looks from an architectural point of view in Istio. As we remember from the first part, Istio has a separate component for collecting telemetry, which is called Mixer. However, in the current version 1.0. * Sending is performed directly from proxy servers, namely, from envoy proxy. Envoy proxy supports sending tracing spans via zipkin protocol out of the box. Other protocols can be connected, but only through the plugin. With Istio, we immediately get a compiled and configured envoy proxy, which supports only the zipkin protocol. If we want to use, for example, the Jaeger protocol and send tracing spans via UDP, then we will need to collect our istio-proxy image. There is support for custom plugins for istio-proxy, however it is still in the alpha version. Therefore, if we want to do without a large number of custom settings, the range of technologies used for storing and receiving tracing span is reduced. Of the main systems, in fact, you can now use Zipkin itself, or Jaeger, but send it all over a zipkin compatible protocol (which is much less efficient). The zipkin protocol itself involves sending all the tracing information to the collectors via the HTTP protocol, which is quite expensive.
As I said, treysit we want application layer protocols. This means that proxy servers that are located next to each service must understand exactly what interaction is happening now. By default, Istio configures plain TCP type for all ports, which means that no traces will be sent. In order for the traces to be sent, it is necessary, first, to enable such an option in the main mesh config and, what is very important, to name all ports of service entities kubernetes in accordance with the protocol used in the service. That is, for example, like this:
apiVersion: v1
kind: Service
metadata:
name: nginx
spec:
ports:
- port: 80
targetPort: 80
name: http
selector:
app: nginx
You can also use compound names, such as http-magic (Istio will see http and recognize this port as http endpoint). The format is proto-extra.
In order not to patch a huge number of configurations for defining a protocol, you can use dirty workaround: patch the Pilot component at the moment when it just performs the logic of defining the protocol . As a result, of course, it will be necessary to change this logic to the standard one and switch to the convention of naming all ports.
In order to understand whether the protocol is defined correctly, you need to go to any of the sidecar containers with envoy proxy and make a request to the admin port of the envoy interface with location / config_dump. In the resulting configuration, you need to look at the desired service field. It is used in Istio as an identifier for where the request is going. In order to customize the value of this parameter in Istio (we will then see it in our tracing system), you need to specify the serviceCluster flag at the launch stage of the sidecar container. For example, it is possible so to calculate the variable derived from the downward API kubernetes:
--serviceCluster ${POD_NAMESPACE}.$(echo ${POD_NAME} | sed -e 's/-[a-z0-9]*-[a-z0-9]*$//g')
A good example for understanding how the tracing in the envoy, there is here .
The endpoint itself to send tracing spans must also be specified in the envoy proxy launch flags, for example:
--zipkinAddress tracing-collector.tracing:9411
Misconception number two: we can cheaply get the full traces of passing requests through the system out of the box
Unfortunately, this is not the case. The complexity of implementation depends on how you have already implemented the interaction of services. Why is that?
The fact is that in order for istio-proxy to understand the correspondence of incoming requests to the service with those leaving the same service, it is not enough just to intercept all traffic. You need to have some kind of connection identifier. The HTTP envoy proxy uses special headers, according to which envoy understands exactly which request to the service generates specific requests to other services. The list of such titles:
- x-request-id,
- x-b3-traceid,
- x-b3-spanid
- x-b3-parentspanid,
- x-b3-sampled,
- x-b3-flags,
- x-ot-span-context.
If you have a single point, for example, a basic client, in which you can add such logic, then everything is fine, you just need to wait for the update of this library for all clients. But if you have a very heterogeneous system and there is no unification in the hike from services to services over the network, then this will most likely be a big problem. Without the addition of such logic, all tracing information will be only “one-level”. That is, we will get all the inter-service interactions, but they will not be glued together in a single chain of passage through the network.
Conclusion
Istio provides a convenient tool for collecting tracing information over the network, however, we must understand that for implementation it will be necessary to adapt your system and take into account the peculiarities of the implementation of Istio. As a result, two main points need to be solved: the definition of the application layer protocol (which should be supported by the envoy proxy) and the setting up of forwarding information about the relatedness of requests to the service from requests from the service (using headers, in the case of the HTTP protocol). When these issues are resolved, we get a powerful tool that allows you to transparently collect information from the network even in very heterogeneous systems written in many different languages and frameworks.
In the next article about Service Mesh, we will discuss one of the biggest problems of Istio - a large consumption of RAM by each sidecar proxy container and discuss how to deal with it.