Etcd Migration Procedure

Explore this Page

Overview

This document provides a structured procedure for migrating an etcd pod from one Kubernetes node to another, typically performed during node maintenance activities such as draining. The steps ensure that etcd data remains intact, consistent, and available throughout the migration process.

Before initiating the migration, take a snapshot of the etcd data to safeguard against potential data loss. Refer to the Disaster Recovery documentation for detailed instructions on creating etcd snapshots.

Drain the etcd Node

In a typical three-node Kubernetes cluster hosting three etcd replicas, first verify the status of the etcd pods by executing the following command.

Copy
Verify etcd Pods
kubectl get pods -n puls8 -l app=etcd -o wide
Copy
Sample Output
NAME            READY   STATUS    RESTARTS   AGE     IP             NODE       NOMINATED NODE   READINESS GATES
puls8-etcd-0    1/1     Running   0          4m9s    10.244.1.212   worker-1   <none>           <none>
puls8-etcd-1    1/1     Running   0          5m16s   10.244.2.219   worker-2   <none>           <none>
puls8-etcd-2    1/1     Running   0          6m28s   10.244.3.203   worker-0   <none>           <none>

To verify the existing etcd key-value data, execute the following commands from any running etcd pod:

Copy
Retrieve etcd Key-Value Pairs
kubectl exec -it puls8-etcd-0 -n puls8 -- bash
#ETCDCTL_API=3
#etcdctl get --prefix ""

In this scenario, the worker-0 node will be drained, and the etcd pod will be migrated to an available node (worker-4).

Initiate the drain operation using the following command:

Copy
Drain Node
kubectl drain worker-0 --ignore-daemonsets --delete-emptydir-data
Copy
Sample Output
node/worker-0 cordoned
Warning: ignoring DaemonSet-managed Pods: 
  kube-system/kube-flannel-ds-pbm7r
  kube-system/kube-proxy-jgjs4, 
  mayastor/mayastor-agent-ha-node-jkd4c, 
  mayastor/mayastor-csi-node-mb89n
  mayastor/mayastor-io-engine-q2n28
  mayastor/mayastor-promethues-prometheus-node-exporter-v6mfs
  mayastor/mayastor-promtail-6vgvm, 
  monitoring/node-exporter-fz247

Evicting pod mayastor/puls8-etcd-2
Evicting pod mayastor/mayastor-agent-core-7c594ff676-2ph69
Evicting pod mayastor/mayastor-operator-diskpool-c8ddb588-cgr29

pod/puls8-etcd-2 evicted
pod/mayastor-agent-core-7c594ff676-2ph69 evicted
pod/mayastor-operator-diskpool-c8ddb588-cgr29 evicted

node/worker-0 drained

Migrate etcd to a New Node

After draining the original node, Kubernetes will automatically reschedule the etcd pod onto an available node (worker-4 in this case).

Initially, the pod may enter a CrashLoopBackOff state due to a bootstrap conflict. This occurs because the etcd member is already registered in the cluster, and it attempts to re-bootstrap upon startup.

Confirm the pod status with the following command:

Copy
Check etcd Pod Status
kubectl get pods -n puls8 -l app=etcd -o wide
Copy
Sample Output
NAME           READY   STATUS             RESTARTS      AGE   IP             NODE       NOMINATED NODE   READINESS GATES
puls8-etcd-0   1/1     Running            0             35m   10.244.1.212   worker-1   <none>           <none>
puls8-etcd-1   1/1     Running            0             36m   10.244.2.219   worker-2   <none>           <none>
puls8-etcd-2   0/1     CrashLoopBackOff   5 (44s ago)   10m   10.244.0.121   worker-4   <none>           <none>

To resolve the issue, update the etcd StatefulSet by modifying the cluster initialization state from new to existing.

Copy
Edit etcd StatefulSet
kubectl edit sts puls8-etcd -n puls8
Copy
Sample Output
- name: ETCD_INITIAL_CLUSTER_STATE
  value: existing

This change instructs the etcd instance to join the existing cluster instead of initializing a new one.

Validate etcd Key-Value Integrity

After the pod is successfully running on the new node, validate that all key-value pairs are intact by executing the following command:

Copy
Validate etcd Key-Value Pairs
kubectl exec -it puls8-etcd-0 -n puls8 -- bash
#ETCDCTL_API=3
#etcdctl get --prefix ""

This validation step is critical to ensure no data loss has occurred during the migration process. Compare the output with the snapshot taken before migration.

Learn More