Kubernetes in production: services

    Six months ago, we completed the migration of all our stateless services to kubernetes. At first glance, the task is quite simple: you need to deploy a cluster, write application specifications and go ahead. Due to the obsession with ensuring stability in the operation of our service, we had to immediately start to understand how k8s works and test various failure scenarios. Most of the questions I had to everything related to the network. One of these "slippery" moments - the work of services (Services) in kubernetes.


    The documentation tells us:


    • roll out the app
    • set the liveness / readiness of the sample
    • create a service
    • then everything will work: load balancing, bounce handling, and so on.

    But in practice everything is more complicated. Let's see how it actually works.


    A bit of theory


    Further, I mean that the reader is already familiar with the device kubernetes and its terminology, just remember what a service is.


    Service - the essence of k8s, which describes a set of pods and methods of access to them.


    For example, we launched our application:


    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: webapp
    spec:
      selector:
        matchLabels: 
          app: webapp
      replicas: 2
      template:
        metadata: 
          labels: 
            app: webapp
        spec:
          containers:
          - name: webapp
            image: defaultxz/webapp
            command: ["/webapp", "0.0.0.0:80"]
            ports:
            - containerPort: 80
            readinessProbe:
              httpGet: {path: /, port: 80}
              initialDelaySeconds: 1
              periodSeconds: 1

    $ kubectl get pods -l app=webapp
    NAME                      READY     STATUS    RESTARTS   AGE
    webapp-5d5d96f786-b2jxb   1/1       Running   0          3h
    webapp-5d5d96f786-rt6j7   1/1       Running   0          3h

    Now, in order to access it, we need to create a service in which we define to which subsections we want to have access (selector) and on which ports:


    kind: Service
    apiVersion: v1
    metadata:
      name: webapp
    spec:
      selector:
        app: webapp
      ports:
      - protocol: TCP
        port: 80
        targetPort: 80

    $ kubectl get svc webapp
    NAME      TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
    webapp    ClusterIP   10.97.149.77   <none>        80/TCP    1d

    Now we can access our service from any cluster machine:


    curl -i http://10.97.149.77
    HTTP/1.1 200 OK
    Date: Mon, 24 Sep 2018 11:55:14 GMT
    Content-Length: 2
    Content-Type: text/plain; charset=utf-8

    How it all works



    Very simplistic:


    • you made kubectl apply Deployment specifications
    • there is magic, the details of which are not important in this context
    • As a result, on some nodes, there were running applications
    • Once a kubelet interval (the k8s agent on each node) performs liveness / readiness samples of all the subs running on its node, it sends the results to the apiserver (interface to the k8s brains)
    • kube-proxy on each node receives notifications from apiserver about all changes to services and pods that are involved in services
    • kube-proxy reflects all changes in the configuration of the underlying subsystems (iptables, ipvs)

    For simplicity, consider the default method of proxying - iptables. In iptables we have for our virtual ip 10.97.149.77:


    -A KUBE-SERVICES -d 10.97.149.77/32 -p tcp -m comment --comment "default/webapp: cluster IP" -m tcp --dport 80 -j KUBE-SVC-BL7FHTIPVYJBLWZN

    traffic goes to the chain KUBE-SVC-BL7FHTIPVYJBLWZN , in which it is distributed between 2 other chains


    -A KUBE-SVC-BL7FHTIPVYJBLWZN -m comment --comment "default/webapp:" -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-UPKHDYQWGW4MVMBS
    -A KUBE-SVC-BL7FHTIPVYJBLWZN -m comment --comment "default/webapp:" -j KUBE-SEP-FFCBJRUPEN3YPZQT

    this is ours


    -A KUBE-SEP-UPKHDYQWGW4MVMBS -p tcp -m comment --comment "default/webapp:" -m tcp -j DNAT --to-destination 10.244.0.10:80
    -A KUBE-SEP-FFCBJRUPEN3YPZQT -p tcp -m comment --comment "default/webapp:" -m tcp -j DNAT --to-destination 10.244.0.11:80

    Testing the failure of one of the pods


    My webapp test application can switch to the "rash errors" mode, for this you need to do pull the URL "/ err".


    The results of ab -c 50 -n 20000 in the middle of the test yanked "/ err" on one of the pods:


    Complete requests:      20000
    Failed requests:        3719

    The point here is not in the specific number of errors (their number will vary depending on the load), but in the fact that they exist. In general, we threw the "bad" under out of balancing, but at the time of switching the client of the service received errors. The reason for the errors is fairly easy to explain: the readiness of the test is performed by the kubelet once a second + there is still a short time for the dissemination of information that did not respond to the test.


    Will IPVS help back end for kube-proxy (experimental)?


    Not really! It solves the optimization problem of proxying, offers a custom balancing algorithm, but does not solve the problem of processing failures.


    How to be


    This problem can be solved only by a balancer who can retry (retries). In other words, for http we need an L7 balancer. Such balancers for kubernetes are already fully used either in the form of ingress (it was meant as a point in turn into a cluster, but by and large it does exactly what is needed) or as an implementation of a separate layer - a service mesh, for example istio .


    In our production we didn’t use either the ingress or the service mesh due to the added complexity. Such abstractions, in my opinion, help in cases where you need to often configure a large number of services. But at the same time you “pay” for controllability and simple infrastructure. You will spend extra time to figure out how to set up schedules, timeouts for a particular service.


    How do we


    We use headless k8s services. Such services do not have a virtual ip and, accordingly, kube-proxy and iptables do not participate in their work. For each such service, you can get a list of live podov either through DNS, or through API.


    For applications that interact with other services, we make a sidecar container with envoy . Evoy periodically receives an up-to-date list of hearths for all necessary services via DNS, and most importantly is able to make repeated attempts to query other hearths in case of an error. You can run it as a DaemonSet on each node, but then if this instance fails, all applications that use it would stop working. Since the consumption of resources by this proxy is rather small, we decided to use it in the variant sidecar container.


    This is essentially exactly what istio does, but in our case the balance has shifted towards simplicity (no need to learn istio, run into its bugs). Perhaps this balance will change, and we will start using something like istio.


    We in Okmeter.io kubernetes definitely got accustomed, and we believe in its further distribution. Support monitoring k8s in our service on the way, stay tuned!


    Also popular now: