Etcd Migration Procedure

Explore this Page

Overview
Drain the etcd Node
Migrate etcd to a New Node
Validate etcd Key-Value Integrity

Overview

This document provides a structured procedure for migrating an etcd pod from one Kubernetes node to another, typically performed during node maintenance activities such as draining. The steps ensure that etcd data remains intact, consistent, and available throughout the migration process.

Before initiating the migration, take a snapshot of the etcd data to safeguard against potential data loss. Refer to the Disaster Recovery documentation for detailed instructions on creating etcd snapshots.

Ensure the etcd cluster is in a healthy state before performing any disruptive operations, such as draining a node or deleting etcd pods. It is strongly recommended to have at least three healthy etcd replicas to maintain quorum and avoid data loss.

Drain the etcd Node

In a typical three-node Kubernetes cluster hosting three etcd replicas, first verify the status of the etcd pods by executing the following command.

Copy

Verify etcd Pods

kubectl get pods -n puls8 -l app=etcd -o wide

Copy

Sample Output

NAME            READY   STATUS    RESTARTS   AGE     IP             NODE       NOMINATED NODE   READINESS GATES
puls8-etcd-0    1/1     Running   0          4m9s    10.244.1.212   worker-1   <none>           <none>
puls8-etcd-1    1/1     Running   0          5m16s   10.244.2.219   worker-2   <none>           <none>
puls8-etcd-2    1/1     Running   0          6m28s   10.244.3.203   worker-0   <none>           <none>

To verify the existing etcd key-value data, execute the following commands from any running etcd pod:

Copy

Retrieve etcd Key-Value Pairs

kubectl exec -it puls8-etcd-0 -n puls8 -- bash
#ETCDCTL_API=3
#etcdctl get --prefix ""

In this scenario, the worker-0 node will be drained, and the etcd pod will be migrated to an available node (worker-4).

Initiate the drain operation using the following command:

Copy

Drain Node

kubectl drain worker-0 --ignore-daemonsets --delete-emptydir-data

Copy

Sample Output

node/worker-0 cordoned
Warning: ignoring DaemonSet-managed Pods: 
  kube-system/kube-flannel-ds-pbm7r, 
  kube-system/kube-proxy-jgjs4, 
  mayastor/mayastor-agent-ha-node-jkd4c, 
  mayastor/mayastor-csi-node-mb89n, 
  mayastor/mayastor-io-engine-q2n28, 
  mayastor/mayastor-promethues-prometheus-node-exporter-v6mfs, 
  mayastor/mayastor-promtail-6vgvm, 
  monitoring/node-exporter-fz247

Evicting pod mayastor/puls8-etcd-2
Evicting pod mayastor/mayastor-agent-core-7c594ff676-2ph69
Evicting pod mayastor/mayastor-operator-diskpool-c8ddb588-cgr29

pod/puls8-etcd-2 evicted
pod/mayastor-agent-core-7c594ff676-2ph69 evicted
pod/mayastor-operator-diskpool-c8ddb588-cgr29 evicted

node/worker-0 drained

Only delete the PVC if you have confirmed that the etcd cluster has at least three healthy replicas and the data is replicated. Deleting the PVC without sufficient redundancy could lead to data loss or cluster failure.

Migrate etcd to a New Node

After draining the original node, Kubernetes will automatically reschedule the etcd pod onto an available node (worker-4 in this case).

Initially, the pod may enter a CrashLoopBackOff state due to a bootstrap conflict. This occurs because the etcd member is already registered in the cluster, and it attempts to re-bootstrap upon startup.

Confirm the pod status with the following command:

Copy

Check etcd Pod Status

kubectl get pods -n puls8 -l app=etcd -o wide

Copy

Sample Output

NAME           READY   STATUS             RESTARTS      AGE   IP             NODE       NOMINATED NODE   READINESS GATES
puls8-etcd-0   1/1     Running            0             35m   10.244.1.212   worker-1   <none>           <none>
puls8-etcd-1   1/1     Running            0             36m   10.244.2.219   worker-2   <none>           <none>
puls8-etcd-2   0/1     CrashLoopBackOff   5 (44s ago)   10m   10.244.0.121   worker-4   <none>           <none>

To resolve the issue, update the etcd StatefulSet by modifying the cluster initialization state from new to existing.

Copy

Edit etcd StatefulSet

kubectl edit sts puls8-etcd -n puls8

Copy

Sample Output

- name: ETCD_INITIAL_CLUSTER_STATE
  value: existing

This change instructs the etcd instance to join the existing cluster instead of initializing a new one.

Validate etcd Key-Value Integrity

After the pod is successfully running on the new node, validate that all key-value pairs are intact by executing the following command:

Copy

Validate etcd Key-Value Pairs

kubectl exec -it puls8-etcd-0 -n puls8 -- bash
#ETCDCTL_API=3
#etcdctl get --prefix ""

This validation step is critical to ensure no data loss has occurred during the migration process. Compare the output with the snapshot taken before migration.

Learn More