The etcd-restore-operator can restore an etcd cluster on Kubernetes from backup.
The overall workflow is:
- Create the etcd-restore-operator
- Create an
EtcdRestore
Custom Resource which triggers a restore request that specifies:- the etcd cluster spec
- how to access the backup
- The etcd-restore-operator will restore a new cluster from the backup
- The etcd-operator takes over the management of the restored cluster
Note that currently the etcd-restore-operator only supports restoring from backups saved on S3.
Prerequisite
- Setup RBAC and deploy an etcd operator. See Install Guide
- Have an etcd backup saved on S3. See the etcd-backup-operator README as one way to save a backup to S3.
Note: This demo uses the
default
namespace.
-
Make sure
example-etcd-cluster
EtcdCluster CR existskubectl get etcdcluster example-etcd-cluster
-
Kill etcd pods to simulate disaster failure
kubectl delete pod -l app=etcd,etcd_cluster=example-etcd-cluster --force --grace-period=0
-
Create a deployment of the etcd-restore-operator:
kubectl create -f example/etcd-restore-operator/deployment.yaml
-
Verify the following resources exist:
$ kubectl get pods NAME READY STATUS RESTARTS AGE etcd-restore-operator-4203122180-npn3g 1/1 Running 0 7s
-
Verify that the etcd-restore-operator creates the
EtcdRestore
CRD:$ kubectl get crd NAME KIND etcdrestores.etcd.database.coreos.com CustomResourceDefinition.v1beta1.apiextensions.k8s.io
Create a Kubernetes secret that contains AWS credentials and config. This is used by the etcd-restore-operator to retrieve the backup from S3.
-
Verify that the local aws config and credentials files exist:
$ cat $AWS_DIR/credentials [default] aws_access_key_id = XXX aws_secret_access_key = XXX $ cat $AWS_DIR/config [default] region = <region>
-
Create the secret
aws
:kubectl create secret generic aws --from-file=$AWS_DIR/credentials --from-file=$AWS_DIR/config
Create the EtcdRestore
CR:
Note: This example uses k8s secret "aws" and S3 path "mybucket/etcd.backup"
sed -e 's|<full-s3-path>|mybucket/etcd.backup|g' \
-e 's|<aws-secret>|aws|g' \
example/etcd-restore-operator/restore_cr.yaml \
| kubectl create -f -
-
Check the
status
section of theEtcdRestore
CR:$ kubectl get etcdrestore example-etcd-cluster -o yaml apiVersion: etcd.database.coreos.com/v1beta2 kind: EtcdRestore ... status: succeeded: true
-
Verify the
EtcdCluster
CR for the restored cluster:$ kubectl get etcdcluster NAME KIND example-etcd-cluster EtcdCluster.v1beta2.etcd.database.coreos.com
-
Verify that the etcd-operator scales the cluster to the desired size:
$ kubectl get pods NAME READY STATUS RESTARTS AGE etcd-operator-2486363115-ltc17 1/1 Running 0 1h etcd-restore-operator-4203122180-npn3g 1/1 Running 0 30m example-etcd-cluster-795649v9kq 1/1 Running 1 3m example-etcd-cluster-jtp447ggnq 1/1 Running 1 4m example-etcd-cluster-psw7sf2hhr 1/1 Running 1 4m
Delete the etcd-restore-operator deployment and service, and the EtcdRestore
CR.
Note: Deleting the
EtcdRestore
CR won't delete theEtcdCluster
CR.
kubectl delete etcdrestore example-etcd-cluster
kubectl delete -f example/etcd-restore-operator/deployment.yaml