Etcd Migration Procedure
Explore this Page
Overview
This document provides a structured procedure for migrating an etcd pod from one Kubernetes node to another, typically performed during node maintenance activities such as draining. The steps ensure that etcd data remains intact, consistent, and available throughout the migration process.
Before initiating the migration, take a snapshot of the etcd data to safeguard against potential data loss. Refer to the Disaster Recovery documentation for detailed instructions on creating etcd snapshots.
Drain the etcd Node
In a typical three-node Kubernetes cluster hosting three etcd replicas, first verify the status of the etcd pods by executing the following command.
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
puls8-etcd-0 1/1 Running 0 4m9s 10.244.1.212 worker-1 <none> <none>
puls8-etcd-1 1/1 Running 0 5m16s 10.244.2.219 worker-2 <none> <none>
puls8-etcd-2 1/1 Running 0 6m28s 10.244.3.203 worker-0 <none> <none>
To verify the existing etcd key-value data, execute the following commands from any running etcd pod:
kubectl exec -it puls8-etcd-0 -n puls8 -- bash
#ETCDCTL_API=3
#etcdctl get --prefix ""
In this scenario, the worker-0
node will be drained, and the etcd pod will be migrated to an available node (worker-4
).
Initiate the drain operation using the following command:
node/worker-0 cordoned
Warning: ignoring DaemonSet-managed Pods:
kube-system/kube-flannel-ds-pbm7r,
kube-system/kube-proxy-jgjs4,
mayastor/mayastor-agent-ha-node-jkd4c,
mayastor/mayastor-csi-node-mb89n,
mayastor/mayastor-io-engine-q2n28,
mayastor/mayastor-promethues-prometheus-node-exporter-v6mfs,
mayastor/mayastor-promtail-6vgvm,
monitoring/node-exporter-fz247
Evicting pod mayastor/puls8-etcd-2
Evicting pod mayastor/mayastor-agent-core-7c594ff676-2ph69
Evicting pod mayastor/mayastor-operator-diskpool-c8ddb588-cgr29
pod/puls8-etcd-2 evicted
pod/mayastor-agent-core-7c594ff676-2ph69 evicted
pod/mayastor-operator-diskpool-c8ddb588-cgr29 evicted
node/worker-0 drained
Migrate etcd to a New Node
After draining the original node, Kubernetes will automatically reschedule the etcd pod onto an available node (worker-4
in this case).
Initially, the pod may enter a CrashLoopBackOff
state due to a bootstrap conflict. This occurs because the etcd member is already registered in the cluster, and it attempts to re-bootstrap upon startup.
Confirm the pod status with the following command:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
puls8-etcd-0 1/1 Running 0 35m 10.244.1.212 worker-1 <none> <none>
puls8-etcd-1 1/1 Running 0 36m 10.244.2.219 worker-2 <none> <none>
puls8-etcd-2 0/1 CrashLoopBackOff 5 (44s ago) 10m 10.244.0.121 worker-4 <none> <none>
To resolve the issue, update the etcd StatefulSet by modifying the cluster initialization state from new
to existing
.
This change instructs the etcd instance to join the existing cluster instead of initializing a new one.
Validate etcd Key-Value Integrity
After the pod is successfully running on the new node, validate that all key-value pairs are intact by executing the following command:
kubectl exec -it puls8-etcd-0 -n puls8 -- bash
#ETCDCTL_API=3
#etcdctl get --prefix ""
This validation step is critical to ensure no data loss has occurred during the migration process. Compare the output with the snapshot taken before migration.
Learn More