Launch of RabbitMQ cluster in Kubernetes

Tutorial

In the case of microservice organization of the application, substantial work rests on the mechanisms of integration communication of microservices. Moreover, this integration should be fault tolerant, with a high degree of availability.

In our solutions, we use integration using Kafka, using gRPC, and using RabbitMQ.

In this article we will share our experience with RabbitMQ clustering, whose nodes are located in Kubernetes.

Prior to RabbitMQ version 3.7, its clustering in K8S was not a very trivial task, with a lot of hacks and not so beautiful solutions. In version 3.6, the autocluster plugin from the RabbitMQ Community was used. And in 3.7 Kubernetes Peer Discovery Backend appeared. It is integrated into the RabbitMQ base distribution and does not require separate assembly and installation.

We will describe the final configuration in its entirety, simultaneously commenting on what is happening.

In theory

The plugin has a repository on the githaba , in which there is an example of basic use .
This example is not intended for Production, which is clearly indicated in its description, and moreover, some of the settings in it are set against the logic of use in the sale. Also in the example, the storage persistence is not mentioned in any way, so in any abnormal situation our cluster will turn into zilch.

On practice

Now let us tell you what we faced and how RabbitMQ was installed and configured.

We describe the configuration of all parts of RabbitMQ as a service in K8s. Immediately clarify that we installed RabbitMQ in K8s as a StatefulSet. On each node of the K8s cluster, one instance of RabbitMQ will always function (one node in the classical cluster configuration). We will also install a RabbitMQ control panel in K8s and give access to this panel outside the cluster.

Rights and Roles:

rabbitmq_rbac.yaml

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: rabbitmq 
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: endpoint-reader
rules:
- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: endpoint-reader
subjects:
- kind: ServiceAccount
  name: rabbitmq
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: endpoint-reader

Access rights for RabbitMQ are taken entirely from the example, no changes are required in them. Create a ServiceAccount for our cluster and grant it read rights to Endpoints K8s.

Persistent storage:

rabbitmq_pv.yaml

kind: PersistentVolume
apiVersion: v1
metadata:
  name: rabbitmq-data-sigma
  labels:
    type: local
  annotations:
    volume.alpha.kubernetes.io/storage-class: rabbitmq-data-sigma
spec:
  storageClassName: rabbitmq-data-sigma
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Recycle
  hostPath:
    path: "/opt/rabbitmq-data-sigma"

Here, we took the simplest case as the persistent storage — hostPath (the usual folder on each K8s node), but you can use any of the many types of persistent volumes that are supported in K8s.

rabbitmq_pvc.yaml

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: rabbitmq-data
spec:
  storageClassName: rabbitmq-data-sigma
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi

Create Volume Claim on the volume created in the previous step. This Claim will then be used in the StatefulSet as a persistent data store.

Services:

rabbitmq_service.yaml

kind: Service
apiVersion: v1
metadata:
  name: rabbitmq-internal
  labels:
    app: rabbitmq
spec:
  clusterIP: None
  ports:
   - name: http
     protocol: TCP
     port: 15672
   - name: amqp
     protocol: TCP
     port: 5672
  selector:
    app: rabbitmq

Create an internal headless service through which the Peer Discovery plugin will work.

rabbitmq_service_ext.yaml

kind: Service
apiVersion: v1
metadata:
  name: rabbitmq
  labels:
    app: rabbitmq
    type: LoadBalancer
spec:
  type: NodePort
  ports:
   - name: http
     protocol: TCP
     port: 15672
     targetPort: 15672
     nodePort: 31673
   - name: amqp
     protocol: TCP
     port: 5672
     targetPort: 5672
     nodePort: 30673
  selector:
    app: rabbitmq

To run applications in K8s with our cluster, we create a balancer service.

Since we need access to the RabbitMQ cluster outside of K8s, we push NodePort. RabbitMQ will be available when accessing any K8s cluster node on ports 31673 and 30673. There is no great need for this in actual operation. The question of ease of use RabbitMQ admin.

When creating a service with the NodePort type in K8s, a service with the ClusterIP type is also implicitly created for its maintenance. Therefore, applications in K8s that need to work with our RabbitMQ will be able to access the cluster at amqp: // rabbitmq: 5672

Configuration:

rabbitmq_configmap.yaml

apiVersion: v1
kind: ConfigMap
metadata:
  name: rabbitmq-config
data:
  enabled_plugins: |
      [rabbitmq_management,rabbitmq_peer_discovery_k8s].
  rabbitmq.conf: |
      cluster_formation.peer_discovery_backend  = rabbit_peer_discovery_k8s
      cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
      cluster_formation.k8s.port = 443
      ### cluster_formation.k8s.address_type = ip
      cluster_formation.k8s.address_type = hostname
      cluster_formation.node_cleanup.interval = 10
      cluster_formation.node_cleanup.only_log_warning = true
      cluster_partition_handling = autoheal
      queue_master_locator=min-masters
      cluster_formation.randomized_startup_delay_range.min = 0
      cluster_formation.randomized_startup_delay_range.max = 2
      cluster_formation.k8s.service_name = rabbitmq-internal
      cluster_formation.k8s.hostname_suffix = .rabbitmq-internal.our-namespace.svc.cluster.local

Create RabbitMQ configuration files. The main magic.

enabled_plugins: |
  [rabbitmq_management,rabbitmq_peer_discovery_k8s].

Add the necessary plugins to the allowed downloads. Now we can use automatic Peer Discovery in K8S.

cluster_formation.peer_discovery_backend  = rabbit_peer_discovery_k8s

We expose the necessary plugin as a backend for peer discovery.

cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
cluster_formation.k8s.port = 443

Specify the address and port through which you can reach the kubernetes apiserver. Here you can directly specify the ip-address, but it will be more beautiful to do so.

In the namespace default, a service named kubernetes is usually created, leading to k8-apiserver. In different ways of installing K8S namespace, the service name and port may be different. If something in a particular installation is different, you need, respectively, to correct here.

For example, we are faced with the fact that in some clusters the service is on port 443, and in some on 6443. To understand that something is wrong, it will be possible in the start logs of RabbitMQ, where the connection point is clearly marked at the address specified here.

### cluster_formation.k8s.address_type = ip
cluster_formation.k8s.address_type = hostname

By default, in the example, the type of addressing RabbitMQ cluster nodes by ip-address was specified. But when you restart the pod, it gets a new IP each time. Surprise! The cluster dies in agony.

Change addressing to hostname. StatefulSet guarantees us the immutability of the hostname within the life cycle of the entire StatefulSet that we are completely satisfied.

cluster_formation.node_cleanup.interval = 10
cluster_formation.node_cleanup.only_log_warning = true

Since if one of the nodes is lost, we assume that it will recover sooner or later, we disable self-deletion by the cluster of inaccessible nodes. In this case, as soon as the node returns to online, it will enter the cluster without losing its previous state.

cluster_partition_handling = autoheal

This parameter determines the actions of the cluster with the loss of quorum. Here you should just read the documentation on this topic and understand for yourself what is closer to a specific use case.

queue_master_locator=min-masters

We determine the choice of wizard for the new queues. With this setting, the wizard will select the node with the least number of queues, so the queues will be distributed evenly across the nodes of the cluster.

cluster_formation.k8s.service_name = rabbitmq-internal

We set the name of the K8s headless service (created by us earlier), through which the RabbitMQ nodes will communicate with each other.

cluster_formation.k8s.hostname_suffix = .rabbitmq-internal.our-namespace.svc.cluster.local

An important thing to work addressing in a cluster by hostname. FQDN hearth K8s is formed as a short name (rabbitmq-0, rabbitmq-1) + suffix (domain part). Here we specify this suffix. In K8S, it looks like . <Service name>. <Namespace name> .svc.cluster.local

kube-dns without any additional configuration resolves names of the type rabbitmq-0.rabbitmq-internal.our-namespace.svc.cluster.local in the ip-address of a specific hearth, which makes possible the entire clustering magic by hostname.

Configuration StatefulSet RabbitMQ:

rabbitmq_statefulset.yaml

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: rabbitmq
spec:
  serviceName: rabbitmq-internal
  replicas: 3
  template:
    metadata:
      labels:
        app: rabbitmq
      annotations:
        scheduler.alpha.kubernetes.io/affinity: >
            {
              "podAntiAffinity": {
                "requiredDuringSchedulingIgnoredDuringExecution": [{
                  "labelSelector": {
                    "matchExpressions": [{
                      "key": "app",
                      "operator": "In",
                      "values": ["rabbitmq"]
                    }]
                  },
                  "topologyKey": "kubernetes.io/hostname"
                }]
              }
            }
    spec:
      serviceAccountName: rabbitmq
      terminationGracePeriodSeconds: 10
      containers:        
      - name: rabbitmq-k8s
        image: rabbitmq:3.7
        volumeMounts:
          - name: config-volume
            mountPath: /etc/rabbitmq
          - name: rabbitmq-data
            mountPath: /var/lib/rabbitmq/mnesia
        ports:
          - name: http
            protocol: TCP
            containerPort: 15672
          - name: amqp
            protocol: TCP
            containerPort: 5672
        livenessProbe:
          exec:
            command: ["rabbitmqctl", "status"]
          initialDelaySeconds: 60
          periodSeconds: 10
          timeoutSeconds: 10
        readinessProbe:
          exec:
            command: ["rabbitmqctl", "status"]
          initialDelaySeconds: 10
          periodSeconds: 10
          timeoutSeconds: 10
        imagePullPolicy: Always
        env:
          - name: MY_POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
          - name: HOSTNAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: RABBITMQ_USE_LONGNAME
            value: "true"
          - name: RABBITMQ_NODENAME
            value: "rabbit@$(HOSTNAME).rabbitmq-internal.$(NAMESPACE).svc.cluster.local"
          - name: K8S_SERVICE_NAME
            value: "rabbitmq-internal"
          - name: RABBITMQ_ERLANG_COOKIE
            value: "mycookie"
      volumes:
        - name: config-volume
          configMap:
            name: rabbitmq-config
            items:
            - key: rabbitmq.conf
              path: rabbitmq.conf
            - key: enabled_plugins
              path: enabled_plugins
        - name: rabbitmq-data
          persistentVolumeClaim:
            claimName: rabbitmq-data

Actually, StatefulSet itself. Note the interesting moments.

serviceName: rabbitmq-internal

We prescribe the name of a headless service, through which pods communicate in the StatefulSet.

replicas: 3

Set the number of replicas in the cluster. We have it is equal to the number of working nodes K8s.

annotations:
        scheduler.alpha.kubernetes.io/affinity: >
            {
              "podAntiAffinity": {
                "requiredDuringSchedulingIgnoredDuringExecution": [{
                  "labelSelector": {
                    "matchExpressions": [{
                      "key": "app",
                      "operator": "In",
                      "values": ["rabbitmq"]
                    }]
                  },
                  "topologyKey": "kubernetes.io/hostname"
                }]
              }
            }

When one of the K8s nodes falls, the statefulset tends to keep the number of instances in the set, so it creates several sub-streams on the same K8s node. This behavior is completely undesirable and, in principle, meaningless. Therefore, we prescribe an anti-affinity rule for hearths from a statefulset. The rule is made tough (Required) so that kube-scheduler cannot break it when scheduling pods.

The point is simple: it is forbidden for the scheduler to place (within the namespace) more than one poda with the app: rabbitmq tag on each node. Nodes are distinguished by the value of the label kubernetes.io/hostname . Now, if for some reason the number of working K8S nodes is less than the required number of replicas in the StatefulSet, new replicas will not be created until the free node reappears.

serviceAccountName: rabbitmq

We register ServiceAccount under which our scams work.

image: rabbitmq:3.7

The image of RabbitMQ is completely standard and is taken from the docker hub, does not require any reassembly and modification by the file.

- name: rabbitmq-data
    mountPath: /var/lib/rabbitmq/mnesia

The persistent data of RabbitMQ is stored in / var / lib / rabbitmq / mnesia. Here we mount our Persistent Volume Claim into this folder, so that when restarting the sub / node or even the entire StatefulSet data (both official, including the collected cluster, and user data) remain intact and intact. There are some examples where the entire / var / lib / rabbitmq / folder is made persistent. We came to the conclusion that this is not the best idea, since it also begins to remember all the information specified by the Rabbit configs. That is, in order to change something in the configuration file, it is necessary to clean the persistent storage, which is very inconvenient in operation.

          - name: HOSTNAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: RABBITMQ_USE_LONGNAME
            value: "true"
          - name: RABBITMQ_NODENAME
            value: "rabbit@$(HOSTNAME).rabbitmq-internal.$(NAMESPACE).svc.cluster.local"

With this set of environment variables, we first tell RabbitMQ to use the FQDN name as the identifier of the cluster members, and second, we set the format of this name. The format was described earlier when parsing the config.

- name: K8S_SERVICE_NAME
            value: "rabbitmq-internal"

The name of a headless service for communicating cluster members.

- name: RABBITMQ_ERLANG_COOKIE
            value: "mycookie"

The contents of the Erlang Cookie should be the same on all nodes of the cluster, you need to register your own value. A node with a different cookie will not be able to enter the cluster.

volumes:
        - name: rabbitmq-data
          persistentVolumeClaim:
            claimName: rabbitmq-data

We define a plug-in volume from the previously created Persistent Volume Claim.

At this point, we are done with customization in K8s. The result is a RabbitMQ cluster that evenly distributes queues among nodes and is resistant to problems in the runtime environment.

If one of the cluster nodes is unavailable, the queues contained in it will no longer be available, everything else will continue to work. As soon as the node returns to the system, it will return to the cluster, and the queues for which it was the Master will become operational again, saving all the data contained in them (if the persistent storage is not broken, of course). All these processes are fully automatic and do not require intervention.

Bonus: set up HA

In one of the projects was a nuance. The requirements sounded complete mirroring of all data contained in the cluster. It is necessary that in a situation where at least one cluster node is operational, from the point of view of the application, everything continues to work. This moment is in no way connected with K8s, we describe it simply as a mini how-to.

To enable full HA, you need to create a Policy in the RabbitMQ dashboard on the Admin -> Policies tab . The name is arbitrary, Pattern is empty (all queues), in Definitions add two parameters: ha-mode: all , ha-sync-mode: automatic .

After that, all the queues created in the cluster will be in High Availability mode: when the Master node is unavailable, the new master will automatically select one of the Slaves. And the data arriving in the queue will be mirrored to all the nodes of the cluster. What, actually, was required to receive.

Useful literature:

Successes!

Tags: