Backing Up and Restoring Kubernetes Resources with Heptio Ark

Original author: Björn Wenzel
  • Transfer

You probably had to rebuild the Kubernetes cluster after a failure. Have you had a smart backup strategy that doesn't require plowing for several days? Yes, you can back up to an etcd cluster, but what if only part of the cluster falls off or you use persistent volumes like AWS EBS?


In such cases, the easiest way is to use the Heptio Ark utility .


With Heptio, you can make backups of the entire cluster, individual namespaces or resource types and make backups on schedule. For me, the main advantage of Heptio Ark is its integration with various cloud service providers, for example AWS, Azure, Google Cloud, etc. So when it is backed up, it takes snapshots of the persistent volumes used.


Let's see how to install this utility and how it makes simple and planned backups, and then restores them.


There will be a separate post about backup of permanent volumes.


Installation


You will find installation instructions here: examples / README.md. This process will create several custom resource definitions, RBAC (role-based access control) rules that allow Heptio to backup, and deployment. By default, they are in the heptio-ark namespace.


Important! After a successful installation, you need to configure heptio-ark to tell the server which cloud service provider to use and where to store the backups. Here is what this configuration looks like:


apiVersion: ark.heptio.com/v1
kind: Config
metadata:
  namespace: heptio-ark
  name: default
backupStorageProvider:
  name: aws
  bucket: heptio-backup-bucket
  config:
    region: eu-central-1
backupSyncPeriod: 30m
gcSyncPeriod: 30m
scheduleSyncPeriod: 1m
restoreOnlyMode: false

You can apply it using the command


 kubectl apply -f heptio.yaml

Heptio now knows in which bucket to back up. The backup storage location must be accessible from the heptio-server hearths, so you can use the instance profile with access to this bucket or Kube2IAM for dynamic hearth-based instance profiles.


Finally, for backups, schedules and recovery, you need to download the Heptio Ark CLI from the GitHub .


Almost all commands can be executed as custom resource definitions through YAML or JSON.


Backup


In this small example, I created a simple deploy NGINX, and before it a service in the webserver namespace :


$ kubectl get all
NAME           DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/nginx   1         1         1            1           28s
NAME                  DESIRED   CURRENT   READY     AGE
rs/nginx-66f5756f9b   1         1         1         28s
NAME                        READY     STATUS    RESTARTS   AGE
po/nginx-66f5756f9b-c88ck   1/1       Running   0          28s
NAME        TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)   AGE
svc/nginx   ClusterIP   10.32.0.183           80/TCP    28s

Let's make a backup from the Heptio Ark CLI:


$ ark backup create nginx-simple --include-namespaces webserver

This command only backs up the webserver namespace . Without this parameter, Heptio Ark will create a full backup of all resources in the Kubernetes cluster. Backup will take some time. A copy will be saved to the specified bucket in S3 ( heptio-backup-bucket ). To view the status and list of all backups, enter the following command in the CLI:


$ ark backup get
NAME                            STATUS      CREATED                          EXPIRES   SELECTOR
nginx-simple                    Completed   2018-07-08 17:35:09 +0200 CEST   29d       

As you can see, the backup is completed.


Backup Recovery


Let's remove the webserver namespace (inline):


$ kubectl delete ns heptio-test

Now, restore the namespace after a “random” deletion, and again from the Heptio Ark CLI:


$ ark restore create --from-backup nginx-simple
Restore request "nginx-simple-20180708173924" submitted successfully.
Run `ark restore describe nginx-simple-20180708173924` for more details.

You should see that the namespace and all resources (deployment, replica set, sub and service) are restored:


$ kubectl get all
NAME           DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/nginx   1         1         1            1           20s
NAME                  DESIRED   CURRENT   READY     AGE
rs/nginx-66f5756f9b   1         1         1         20s
NAME                        READY     STATUS    RESTARTS   AGE
po/nginx-66f5756f9b-9mjvg   1/1       Running   0          20s
NAME        TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
svc/nginx   ClusterIP   10.32.0.77           80/TCP    20s

Backup structure


To view the backup structure, simply load it from the bucket into S3 or enter the Heptio Ark command:


$ ark backup download nginx-simple
Backup nginx-simple has been successfully downloaded to $PWD/nginx-simple-data.tar.gz


In the webserver.json file of our namespace, we see a regular namespace resource.


{
  "apiVersion":"v1",
  "kind":"Namespace",
  "metadata": {
    "annotations": {
  "kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"v1\",\"kind\":\"Namespace\",\"metadata\":{\"annotations\":{},\"name\":\"webserver\",\"namespace\":\"\"}}\n"
    },
    "creationTimestamp":"2018-07-08T15:26:44Z",
    "name":"webserver",
    "resourceVersion":"3364",
    "selfLink":"/api/v1/namespaces/webserver",
    "uid":"52698ae7-82c3-11e8-8529-0645eb60c3f4"
  },
  "spec": {
    "finalizers":["kubernetes"]
  },
  "status": {
    "phase":"Active"
  }
}

If we do not need a full recovery, we can restore only a part using the Heptio Ark command or go to the backup directly and restore this part via kubectl.


$ ark schedule create nginx-schedule --schedule="* 10 * * *" --include-namespaces webserver
Schedule "nginx-schedule" created successfully.

Scheduled backup


Heptio Ark can perform scheduled tasks. We indicate which resources and namespaces should be included in the backup or excluded from it and when to backup:


$ ark schedule create nginx-schedule --schedule="* 10 * * *" --include-namespaces webserver
Schedule "nginx-schedule" created successfully.

In this case, a backup will be created every day at 10 o’clock and include only the webserver namespace. In the Heptio Ark CLI, we see that a schedule has appeared and Heptio Ark has already created the first backup:


$ ark schedule get
NAME             STATUS    CREATED                          SCHEDULE       BACKUP TTL   LAST BACKUP   SELECTOR
nginx-schedule   Enabled   2018-07-08 17:49:00 +0200 CEST   * 10 * * *     720h0m0s     25s ago       
$ ~/Downloads/ark backup get
NAME                            STATUS      CREATED                          EXPIRES   SELECTOR
nginx-schedule-20180708154900   Completed   2018-07-08 17:49:00 +0200 CEST   29d       
nginx-simple                    Completed   2018-07-08 17:35:09 +0200 CEST   29d       

Here it is indicated that scheduled backups are deleted after 720 hours, that is, after 30 days. When you create a backup or schedule, you can specify the lifetime of the copy - TTL. After this period, the backup will be deleted from the repository, in our case AWS.


Also popular now: