Containers, Microservices, and Service Meshes
- Transfer
On the Internet a bunch of articles about service meshes (service mesh), and here's another one. Hurrah! But why? Then, what I want to express my opinion is that it would be better to have service meshes appeared 10 years ago, before the emergence of container platforms such as Docker and Kubernetes. I do not claim that my point of view is better or worse than others, but since service meshes are quite complex animals, the multiplicity of points of view will help to better understand them.
I’ll talk about the dotCloud platform, which was built on more than a hundred microservices and supported thousands of applications in containers. I will explain the problems that we encountered during its development and launch, and how service meshes could help (or could not).
I already wrote about the history of dotCloud and the choice of architecture for this platform, but talked a little about the network level. If you don’t want to dive into reading the previous article about dotCloud, here’s a brief summary: it’s a PaaS platform-as-a-service that allows clients to launch a wide range of applications (Java, PHP, Python ...), with support for a wide range of data services (MongoDB, MySQL, Redis ...) and a workflow like Heroku: you upload your code to the platform, it builds images of containers and deploys them.
I’ll tell you how traffic was directed to the dotCloud platform. Not because it was especially cool (although the system worked well for its time!), But primarily because with the help of modern tools such a design can easily be implemented in a short time by a modest team if they need a way to route traffic between a bunch of microservices or a bunch of applications. Thus, you can compare the options: what happens if you develop everything yourself or use the existing service mesh. Standard choice: do it yourself or buy.
DotCloud applications can provide HTTP and TCP endpoints.
HTTP endpoints are dynamically added to the Hipache load balancer cluster configuration . This is similar to what Kubernetes Ingress resources and a load balancer like Traefik do today .
Clients connect to HTTP endpoints through their respective domains, provided that the domain name points to dotCloud load balancers. Nothing special.
TCP endpoints are associated with a port number, which is then passed to all containers of this stack through environment variables.
Clients can connect to TCP endpoints using the appropriate hostname (something like gateway-X.dotcloud.com) and the port number.
This hostname resolves to a cluster of “nats” servers (not related to NATS ) that will route incoming TCP connections to the correct container (or, in the case of load-balanced services, to the correct containers).
If you are familiar with Kubernetes, this will probably remind you of NodePort services .
There was no equivalent of ClusterIP services on the dotCloud platform : for simplicity, access to the services was the same both from the inside and the outside of the platform.
Everything was organized quite simply: the initial implementations of the HTTP and TCP routing networks, probably just a few hundred lines of Python. Simple (I would say, naive) algorithms that were finalized with the growth of the platform and the advent of additional requirements.
Extensive refactoring of existing code was not required. In particular, 12-factor applications can directly use the address obtained through environment variables.
Limited visibility . We generally had no metrics for the TCP routing grid. As for HTTP routing, in later versions detailed HTTP metrics appeared with error codes and response times, but modern service meshes go even further, providing integration with metric collection systems like Prometheus, for example.
Visibility is important not only from an operational point of view (to help troubleshoot problems), but also when new features are released. It's about a safe blue-green deploy and deploy of canaries .
Routing efficiencyalso limited. In the dotCloud routing grid, all traffic had to go through a cluster of dedicated routing nodes. This meant a potential crossing of several AZ borders (accessibility zones) and a significant increase in delay. I remember how I fixed problems with code that made more than a hundred SQL queries per page and for each query opened a new connection to the SQL server. When launched locally, the page loads instantly, but in dotCloud, loading takes a few seconds, because it takes tens of milliseconds for each TCP connection (and subsequent SQL query). In this particular case, persistent connections solved the problem.
Modern service meshes do better with such problems. First of all, they check that the connections are routed at the source.. The logical stream is the same:
Modern service meshes also implement smarter load balancing algorithms. By controlling the performance of backends, they can send more traffic to faster backends, which leads to an increase in overall performance.
Security is better too. The dotCloud routing grid worked completely on EC2 Classic and did not encrypt traffic (assuming that if someone managed to put a sniffer on EC2 network traffic, you already have big problems). Modern service meshes transparently protect all our traffic, for example, with mutual TLS authentication and subsequent encryption.
Ok, we discussed traffic between applications, but what about the dotCloud platform itself?
The platform itself consisted of about a hundred microservices responsible for various functions. Some received requests from others, and some were background workers who connected to other services but did not accept connections. In any case, each service must know the endpoints of the addresses to which it is necessary to connect.
Many high-level services can use the routing grid described above. In fact, many of the more than hundreds of dotCloud microservices have been deployed as regular applications on the dotCloud platform itself. But a small number of low-level services (in particular, which implement this routing grid) needed something simpler, with fewer dependencies (since they could not depend on themselves for work - a good old chicken and egg problem).
These low-level, important services were deployed by running containers directly on several key nodes. At the same time, standard platform services were not involved: the linker, scheduler and runner. If you want to compare with modern container platforms, it’s like launching a control plane with
These services were exposed in a simple and crude way: their names and addresses were listed in the YAML file; and each client had to take a copy of this YAML file for deployment.
On the one hand, it is extremely reliable, because it does not require support of an external key / value storage such as Zookeeper (do not forget, at that time etcd or Consul did not exist yet). On the other hand, this made it difficult to move services. Each time when moving, all clients should have received an updated YAML file (and potentially reboot). Not very comfortable!
Subsequently, we began to introduce a new scheme, where each client connected to a local proxy server. Instead of the address and port, it is enough for him to know only the port number of the service, and connect through
(It was also planned to encapsulate traffic in TLS connections and put another proxy server on the receiving side, as well as check TLS certificates without the participation of the receiving service, which is configured to only accept connections
This is very similar to Airbnb’s SmartStack , but the significant difference is that SmartStack is implemented and deployed in production, while the internal dotCloud routing system was put in a box when dotCloud turned into Docker.
I personally consider SmartStack to be one of the predecessors of such systems as Istio, Linkerd and Consul Connect, because they all follow the same pattern:
If we need to implement a similar grid today, we can use similar principles. For example, configure the internal DNS zone by mapping service names to addresses in space
This is how Istio works! But with some differences:
Let's take a quick look at some of the differences.
Enftoy Proxy was written by Lyft [Uber competitor in the taxi market - approx. trans.]. It is very similar to other proxies in many ways (for example, HAProxy, Nginx, Traefik ...), but Lyft wrote her own because they needed functions that are not in other proxies, and it seemed more reasonable to make a new one than to expand the existing one.
Envoy can be used on its own. If I have a specific service that should connect to other services, I can configure it to connect to Envoy, and then dynamically configure and reconfigure Envoy with the location of other services, while receiving many excellent additional features, for example, visibility. Instead of a custom client library or embedding call tracking in the code, we direct traffic to Envoy, and it collects metrics for us.
But Envoy is also able to work asdata plane (data plane) for the service of the mesh. This means that for this service mesh, Envoy is now configured by the control plane.
In the management plane, Istio relies on the Kubernetes API. This is not very different from using confd , which relies on etcd or Consul to view a set of keys in a data warehouse. Istio, through the Kubernetes API, views the Kubernetes resource set.
Between the case : I personally found this Kubernetes API description useful , which reads:
Istio is designed to work with Kubernetes; and if you want to use it outside of Kubernetes, then you need to run an instance of the Kubernetes API server (and auxiliary service etcd).
Istio relies on ClusterIP addresses that Kubernetes allocates, so Istio services get an internal address (not in range
Traffic to the ClusterIP address for a specific service in the Kubernetes cluster without Istio is intercepted by kube-proxy and sent to the server part of this proxy. If you are interested in technical details, then kube-proxy sets the iptables rules (or IPVS load balancers, depending on how you configure it) to rewrite the destination IP addresses of the connections going to the ClusterIP address.
After installing Istio in a Kubernetes cluster, nothing changes until it is explicitly turned on for a given consumer or even the entire namespace by introducing a container
When integrated with Kubernetes DNS, this means that our code can connect by the name of the service, and everything "just works." In other words, our code issues requests of the type
Istio also provides end-to-end encryption and authentication through mTLS (mutual TLS). The component called Citadel is responsible for this .
There is also a Mixer component that Envoy can request for each request in order to make a special decision about this request, depending on various factors, such as headers, backend loading, etc. ... (don’t worry: there are many ways to ensure that the Mixer works, and even if it crashes, Envoy will continue to work normally as a proxy).
And, of course, we mentioned visibility: Envoy collects a huge number of metrics, while providing distributed tracing. In the architecture of microservices, if one API request must go through microservices A, B, C, and D, then when you log in to the system, the distributed trace will add a unique identifier to the request and save this identifier through subqueries to all these microservices, allowing you to record all related calls, their delays etc.
Istio has a reputation for being a complex system. In contrast, building a routing grid, which I described at the beginning of this post, is relatively simple using existing tools. So, does it make sense to create your own service mesh instead?
If we have modest needs (you do not need visibility, a circuit breaker and other subtleties), then thoughts come about developing your own tool. But if we use Kubernetes, it may not even be necessary, because Kubernetes already provides basic tools for service discovery and load balancing.
But if we have advanced requirements, then “buying” a service mesh seems to be a much better option. (This is not always a “purchase”, since Istio comes with open source code, but we still need to invest engineering time to understand its work, to deploy and manage it).
So far we have only talked about Istio, but this is not the only service mesh. A popular alternative is Linkerd , and there is also Consul Connect .
What to choose?
Honestly, I don’t know. At the moment, I do not consider myself competent enough to answer this question. There are some interesting articles comparing these tools and even benchmarks .
One promising approach is to use a tool like SuperGloo. It implements an abstraction layer to simplify and unify the APIs provided by service meshes. Instead of studying specific (and, in my opinion, relatively complex) APIs of various service meshes, we can use simpler SuperGloo constructions - and easily switch from one to another, as if we have an intermediate configuration format that describes HTTP interfaces and backends capable of generating the actual configuration for Nginx, HAProxy, Traefik, Apache ...
I got a little pampered with Istio and SuperGloo, and in the next article I want to show how to add Istio or Linkerd to an existing cluster using SuperGloo, and how much the latter will cope with its work, for there allows you to switch from one service to another mesh without overwriting configurations.
I’ll talk about the dotCloud platform, which was built on more than a hundred microservices and supported thousands of applications in containers. I will explain the problems that we encountered during its development and launch, and how service meshes could help (or could not).
History of dotCloud
I already wrote about the history of dotCloud and the choice of architecture for this platform, but talked a little about the network level. If you don’t want to dive into reading the previous article about dotCloud, here’s a brief summary: it’s a PaaS platform-as-a-service that allows clients to launch a wide range of applications (Java, PHP, Python ...), with support for a wide range of data services (MongoDB, MySQL, Redis ...) and a workflow like Heroku: you upload your code to the platform, it builds images of containers and deploys them.
I’ll tell you how traffic was directed to the dotCloud platform. Not because it was especially cool (although the system worked well for its time!), But primarily because with the help of modern tools such a design can easily be implemented in a short time by a modest team if they need a way to route traffic between a bunch of microservices or a bunch of applications. Thus, you can compare the options: what happens if you develop everything yourself or use the existing service mesh. Standard choice: do it yourself or buy.
Traffic routing for hosted applications
DotCloud applications can provide HTTP and TCP endpoints.
HTTP endpoints are dynamically added to the Hipache load balancer cluster configuration . This is similar to what Kubernetes Ingress resources and a load balancer like Traefik do today .
Clients connect to HTTP endpoints through their respective domains, provided that the domain name points to dotCloud load balancers. Nothing special.
TCP endpoints are associated with a port number, which is then passed to all containers of this stack through environment variables.
Clients can connect to TCP endpoints using the appropriate hostname (something like gateway-X.dotcloud.com) and the port number.
This hostname resolves to a cluster of “nats” servers (not related to NATS ) that will route incoming TCP connections to the correct container (or, in the case of load-balanced services, to the correct containers).
If you are familiar with Kubernetes, this will probably remind you of NodePort services .
There was no equivalent of ClusterIP services on the dotCloud platform : for simplicity, access to the services was the same both from the inside and the outside of the platform.
Everything was organized quite simply: the initial implementations of the HTTP and TCP routing networks, probably just a few hundred lines of Python. Simple (I would say, naive) algorithms that were finalized with the growth of the platform and the advent of additional requirements.
Extensive refactoring of existing code was not required. In particular, 12-factor applications can directly use the address obtained through environment variables.
How does this differ from a modern service mesh?
Limited visibility . We generally had no metrics for the TCP routing grid. As for HTTP routing, in later versions detailed HTTP metrics appeared with error codes and response times, but modern service meshes go even further, providing integration with metric collection systems like Prometheus, for example.
Visibility is important not only from an operational point of view (to help troubleshoot problems), but also when new features are released. It's about a safe blue-green deploy and deploy of canaries .
Routing efficiencyalso limited. In the dotCloud routing grid, all traffic had to go through a cluster of dedicated routing nodes. This meant a potential crossing of several AZ borders (accessibility zones) and a significant increase in delay. I remember how I fixed problems with code that made more than a hundred SQL queries per page and for each query opened a new connection to the SQL server. When launched locally, the page loads instantly, but in dotCloud, loading takes a few seconds, because it takes tens of milliseconds for each TCP connection (and subsequent SQL query). In this particular case, persistent connections solved the problem.
Modern service meshes do better with such problems. First of all, they check that the connections are routed at the source.. The logical stream is the same:
клиент → меш → сервис
but now the mesh works locally and not on remote nodes, so the connection клиент → меш
is local and very fast (microseconds instead of milliseconds). Modern service meshes also implement smarter load balancing algorithms. By controlling the performance of backends, they can send more traffic to faster backends, which leads to an increase in overall performance.
Security is better too. The dotCloud routing grid worked completely on EC2 Classic and did not encrypt traffic (assuming that if someone managed to put a sniffer on EC2 network traffic, you already have big problems). Modern service meshes transparently protect all our traffic, for example, with mutual TLS authentication and subsequent encryption.
Traffic Routing for Platform Services
Ok, we discussed traffic between applications, but what about the dotCloud platform itself?
The platform itself consisted of about a hundred microservices responsible for various functions. Some received requests from others, and some were background workers who connected to other services but did not accept connections. In any case, each service must know the endpoints of the addresses to which it is necessary to connect.
Many high-level services can use the routing grid described above. In fact, many of the more than hundreds of dotCloud microservices have been deployed as regular applications on the dotCloud platform itself. But a small number of low-level services (in particular, which implement this routing grid) needed something simpler, with fewer dependencies (since they could not depend on themselves for work - a good old chicken and egg problem).
These low-level, important services were deployed by running containers directly on several key nodes. At the same time, standard platform services were not involved: the linker, scheduler and runner. If you want to compare with modern container platforms, it’s like launching a control plane with
docker run
directly on the nodes, instead of delegating the Kubernetes task. This is pretty similar to the concept of static modules (hearths) that kubeadm or bootkube uses when loading a standalone cluster. These services were exposed in a simple and crude way: their names and addresses were listed in the YAML file; and each client had to take a copy of this YAML file for deployment.
On the one hand, it is extremely reliable, because it does not require support of an external key / value storage such as Zookeeper (do not forget, at that time etcd or Consul did not exist yet). On the other hand, this made it difficult to move services. Each time when moving, all clients should have received an updated YAML file (and potentially reboot). Not very comfortable!
Subsequently, we began to introduce a new scheme, where each client connected to a local proxy server. Instead of the address and port, it is enough for him to know only the port number of the service, and connect through
localhost
. The local proxy server processes this connection and routes it to the actual server. Now, when moving the backend to another machine or scaling instead of updating all clients, you need to update only all these local proxies; and a reboot is no longer required. (It was also planned to encapsulate traffic in TLS connections and put another proxy server on the receiving side, as well as check TLS certificates without the participation of the receiving service, which is configured to only accept connections
localhost
. More on this later). This is very similar to Airbnb’s SmartStack , but the significant difference is that SmartStack is implemented and deployed in production, while the internal dotCloud routing system was put in a box when dotCloud turned into Docker.
I personally consider SmartStack to be one of the predecessors of such systems as Istio, Linkerd and Consul Connect, because they all follow the same pattern:
- Running proxies on each node.
- Clients connect to the proxy.
- The management plane updates the proxy configuration when changing backends.
- ... Profit!
Modern implementation of a service mesh
If we need to implement a similar grid today, we can use similar principles. For example, configure the internal DNS zone by mapping service names to addresses in space
127.0.0.0/8
. Then run HAProxy on each node of the cluster, accepting connections to each service address (in this subnet 127.0.0.0/8
) and redirecting / balancing the load to the corresponding backends. HAProxy configuration can be controlled by confd , allowing you to store backend information in etcd or Consul and automatically push the updated configuration to HAProxy when necessary. This is how Istio works! But with some differences:
- Uses Envoy Proxy instead of HAProxy.
- Saves backend configuration via Kubernetes API instead of etcd or Consul.
- Services are allocated addresses on the internal subnet (Kubernetes ClusterIP addresses) instead of 127.0.0.0/8.
- It has an optional component (Citadel) to add mutual TLS authentication between the client and servers.
- Supports new features such as circuit breaking, distributed tracing, deploying canaries, etc.
Let's take a quick look at some of the differences.
Envoy proxy
Enftoy Proxy was written by Lyft [Uber competitor in the taxi market - approx. trans.]. It is very similar to other proxies in many ways (for example, HAProxy, Nginx, Traefik ...), but Lyft wrote her own because they needed functions that are not in other proxies, and it seemed more reasonable to make a new one than to expand the existing one.
Envoy can be used on its own. If I have a specific service that should connect to other services, I can configure it to connect to Envoy, and then dynamically configure and reconfigure Envoy with the location of other services, while receiving many excellent additional features, for example, visibility. Instead of a custom client library or embedding call tracking in the code, we direct traffic to Envoy, and it collects metrics for us.
But Envoy is also able to work asdata plane (data plane) for the service of the mesh. This means that for this service mesh, Envoy is now configured by the control plane.
Control plane
In the management plane, Istio relies on the Kubernetes API. This is not very different from using confd , which relies on etcd or Consul to view a set of keys in a data warehouse. Istio, through the Kubernetes API, views the Kubernetes resource set.
Between the case : I personally found this Kubernetes API description useful , which reads:
The Kubernetes API Server is a “dumb server” that offers storage, versioning, validation, updating, and semantics of API resources.
Istio is designed to work with Kubernetes; and if you want to use it outside of Kubernetes, then you need to run an instance of the Kubernetes API server (and auxiliary service etcd).
Service Addresses
Istio relies on ClusterIP addresses that Kubernetes allocates, so Istio services get an internal address (not in range
127.0.0.0/8
). Traffic to the ClusterIP address for a specific service in the Kubernetes cluster without Istio is intercepted by kube-proxy and sent to the server part of this proxy. If you are interested in technical details, then kube-proxy sets the iptables rules (or IPVS load balancers, depending on how you configure it) to rewrite the destination IP addresses of the connections going to the ClusterIP address.
After installing Istio in a Kubernetes cluster, nothing changes until it is explicitly turned on for a given consumer or even the entire namespace by introducing a container
sidecar
in custom hearths. This container will start an instance of Envoy and set a series of iptables rules to intercept traffic to other services and redirect that traffic to Envoy. When integrated with Kubernetes DNS, this means that our code can connect by the name of the service, and everything "just works." In other words, our code issues requests of the type
http://api/v1/users/4242
, then api
resolves the request to 10.97.105.48
, iptables rules intercept connections from 10.97.105.48 and redirect them to the local Envoy proxy, and this local proxy will direct the request to the actual API backend. Fuh!Extra little thingies
Istio also provides end-to-end encryption and authentication through mTLS (mutual TLS). The component called Citadel is responsible for this .
There is also a Mixer component that Envoy can request for each request in order to make a special decision about this request, depending on various factors, such as headers, backend loading, etc. ... (don’t worry: there are many ways to ensure that the Mixer works, and even if it crashes, Envoy will continue to work normally as a proxy).
And, of course, we mentioned visibility: Envoy collects a huge number of metrics, while providing distributed tracing. In the architecture of microservices, if one API request must go through microservices A, B, C, and D, then when you log in to the system, the distributed trace will add a unique identifier to the request and save this identifier through subqueries to all these microservices, allowing you to record all related calls, their delays etc.
Develop or buy
Istio has a reputation for being a complex system. In contrast, building a routing grid, which I described at the beginning of this post, is relatively simple using existing tools. So, does it make sense to create your own service mesh instead?
If we have modest needs (you do not need visibility, a circuit breaker and other subtleties), then thoughts come about developing your own tool. But if we use Kubernetes, it may not even be necessary, because Kubernetes already provides basic tools for service discovery and load balancing.
But if we have advanced requirements, then “buying” a service mesh seems to be a much better option. (This is not always a “purchase”, since Istio comes with open source code, but we still need to invest engineering time to understand its work, to deploy and manage it).
What to choose: Istio, Linkerd or Consul Connect?
So far we have only talked about Istio, but this is not the only service mesh. A popular alternative is Linkerd , and there is also Consul Connect .
What to choose?
Honestly, I don’t know. At the moment, I do not consider myself competent enough to answer this question. There are some interesting articles comparing these tools and even benchmarks .
One promising approach is to use a tool like SuperGloo. It implements an abstraction layer to simplify and unify the APIs provided by service meshes. Instead of studying specific (and, in my opinion, relatively complex) APIs of various service meshes, we can use simpler SuperGloo constructions - and easily switch from one to another, as if we have an intermediate configuration format that describes HTTP interfaces and backends capable of generating the actual configuration for Nginx, HAProxy, Traefik, Apache ...
I got a little pampered with Istio and SuperGloo, and in the next article I want to show how to add Istio or Linkerd to an existing cluster using SuperGloo, and how much the latter will cope with its work, for there allows you to switch from one service to another mesh without overwriting configurations.