Installing DataCore Puls8 on GKE

Explore this Page

Overview

This document outlines the steps required to install and configure Replicated Persistent Volumes (PV) on Google Kubernetes Engine (GKE). It includes key prerequisites, commands for setting up local SSDs, pool creation, and considerations for node failure recovery. Leveraging DataCore Puls8 with GKE local SSDs ensures high performance and resilient storage in Kubernetes environments.

GKE with Local SSDs

Local SSDs in GKE are physically attached to the virtual machine (VM) instances and provide high Input/Output Operations Per Second (IOPS) and low latency. However, they are ephemeral. Data is lost if the node is deleted, restarted, or rescheduled.

To overcome the limitations of ephemeral storage, DataCore Puls8 with Mayastor provides:

  • Data Replication: Replicates data across nodes to prevent data loss during node failure.
  • High Performance: Utilizes the native performance of local SSDs for latency-sensitive workloads.
  • Cloud-Native Resilience: Ensures storage resilience and availability in a Kubernetes-native way.

Additional local SSDs must be attached during cluster creation. You cannot add SSDs to an existing node pool.

Requirements

Node Image

Replicated PV Mayastor requires GKE nodes to be provisioned with the ubuntu_containerd image. Ensure this image type is selected during cluster creation.

Hardware and Node Configuration

  • Minimum of 3 worker nodes is required.
  • The number of nodes must be equal to or exceed the replication factor (for synchronous replication).

Adding Local SSDs (Block Device Mode)

Use the following command to create a GKE cluster with local SSDs in block mode:

Copy
Cluster Creation with Block Device Local SSDs
gcloud container clusters create <CLUSTER_NAME> \
  --machine-type <MACHINE_TYPE> \
  --num-nodes <NUMBER_OF_NODES> \
  --zone <ZONE> \
  --local-nvme-ssd-block count=1 \
  --image-type ubuntu_containerd

Enable HugePages

HugePages of size 2MiB must be enabled on each storage node. Ensure at least 1024 HugePages (2GiB) are reserved for Mayastor IO Engine pods.

SSH into the GKE nodes and configure HugePages as per GKE node SSH instructions.

Load Kernel Modules

SSH into each GKE node and load the required nvme_tcp kernel module:

Copy
Load nvme_tcp Kernel Module
modprobe nvme_tcp

Prepare the Cluster

Refer to the DataCore Puls8 Prerequisites documentation for steps to prepare the cluster environment.

Configure ETCD and Loki Storage Classes

Use the GKE standard-rwo StorageClass for persistent volumes used by ETCD and Loki.

Install DataCore Puls8 on GKE

Install DataCore Puls8 using Helm:

Copy
Helm Installation Command
helm install openebs --namespace openebs openebs/openebs \
  --create-namespace \
  --set openebs-crds.csi.volumeSnapshots.enabled=false \
  --set mayastor.etcd.localpvScConfig.enabled=false \
  --set mayastor.etcd.persistence.enabled=true \
  --set mayastor.etcd.persistence.storageClass=standard-rwo \
  --set mayastor.loki-stack.localpvScConfig.enabled=false \
  --set mayastor.loki-stack.loki.persistence.enabled=true \
  --set mayastor.loki-stack.loki.persistence.storageClassName=standard-rwo

GKE pre-installs volume snapshot CRDs. Disable them in the Helm chart to avoid resource conflicts during installation.

Verify Installation

Use the following command to verify that all pods are running:

Copy
Check the Status of DataCore Puls8 Components
kubectl get pods -n openebs
Copy
Sample Output
NAME                                              READY   STATUS    RESTARTS   AGE
openebs-agent-core-674f784df5-7szbm               2/2     Running   0          11m
openebs-agent-ha-node-nnkmv                       1/1     Running   0          11m
openebs-agent-ha-node-pvcrr                       1/1     Running   0          11m
openebs-agent-ha-node-rqkkk                       1/1     Running   0          11m
openebs-api-rest-79556897c8-b824j                 1/1     Running   0          11m
openebs-csi-controller-b5c47d49-5t5zd             6/6     Running   0          11m
openebs-csi-node-flq49                            2/2     Running   0          11m
openebs-csi-node-k8d7h                            2/2     Running   0          11m
openebs-csi-node-v7jfh                            2/2     Running   0          11m
openebs-etcd-0                                    1/1     Running   0          11m
openebs-etcd-1                                    1/1     Running   0          11m
openebs-etcd-2                                    1/1     Running   0          11m
openebs-io-engine-7t6tf                           2/2     Running   0          11m
openebs-io-engine-9df6r                           2/2     Running   0          11m
openebs-io-engine-rqph4                           2/2     Running   0          11m
openebs-localpv-provisioner-6ddf7c7978-4fkvs      1/1     Running   0          11m
openebs-loki-0                                    1/1     Running   0          11m
openebs-lvm-localpv-controller-7b6d6b4665-fk78q   5/5     Running   0          11m
openebs-lvm-localpv-node-mcch4                    2/2     Running   0          11m
openebs-lvm-localpv-node-pdt88                    2/2     Running   0          11m
openebs-lvm-localpv-node-r9jn2                    2/2     Running   0          11m
openebs-nats-0                                    3/3     Running   0          11m
openebs-nats-1                                    3/3     Running   0          11m
openebs-nats-2                                    3/3     Running   0          11m
openebs-obs-callhome-854bc967-5f879               2/2     Running   0          11m
openebs-operator-diskpool-5586b65c-cwpr8          1/1     Running   0          11m
openebs-promtail-2vrzk                            1/1     Running   0          11m
openebs-promtail-mwxk8                            1/1     Running   0          11m
openebs-promtail-w7b8k                            1/1     Running   0          11m
openebs-zfs-localpv-controller-f78f7467c-blr7q    5/5     Running   0          11m
openebs-zfs-localpv-node-h46m5                    2/2     Running   0          11m
openebs-zfs-localpv-node-svfgq                    2/2     Running   0          11m
openebs-zfs-localpv-node-wm9ks                    2/2     Running   0          11m

Create and Configure Disk Pools

List Available Block Devices

Use the kubectl puls8 mayastor plugin to list available block devices:

Copy
List Block Devices on a Specific Node
kubectl puls8 mayastor get block-devices gke-gke-ssd-default-pool-<NODE_ID>

Identify SSD Block Size

Run the following commands on the node to determine the block size:

Copy
Show Disk Details
fdisk -l /dev/nvme1n1
Copy
Sample Output
Disk /dev/nvme1n1: 375 GiB, 402653184000 bytes, 98304000 sectors
Disk model: nvme_card0
Units: sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

 

Copy
Display Block Sizes
lsblk -o NAME,PHY-SeC
Copy
Sample Output
NAME         PHY-SEC
nvme0n1          512
├─nvme0n1p1      512
├─nvme0n1p14     512
└─nvme0n1p15     512
nvme1n1         4096

Create a Pool YAML

Below is a sample pool.yaml to define a DiskPool:

Copy
pool.yaml - DiskPool Definition
apiVersion: "openebs.io/v1beta2"
kind: DiskPool
metadata:
  name: pool-1
  namespace: mayastor
spec:
  node: <NODE_NAME>
  disks:
    - "aio:////dev/disk/by-id/google-local-nvme-ssd-0?blk_size=4096"
Copy
Command
kubectl apply -f pool.yaml
Copy
Sample Output
diskpool.openebs.io/pool-1 created

Storage Configuration and Application Deployment

Node Failure Handling

GKE nodes are part of a managed instance group. When a node is terminated, GKE automatically creates a replacement node with a new SSD.

In case of node failure:

  • The associated pool becomes Unknown.
  • The Mayastor volume enters a Degraded state due to replica loss.
Copy
View Pool States after a Node Failure
kubectl get dsp -n mayastor
Copy
Sample Output
NAME     NODE                                           STATE     POOL_STATUS   CAPACITY       USED         AVAILABLE
pool-1   gke-gke-local-ssd-default-pool-dd2b0b02-08cs   Created   Online        402258919424   5368709120   396890210304
pool-2   gke-gke-local-ssd-default-pool-dd2b0b02-n6wq   Created   Online        402258919424   5368709120   396890210304
pool-3   gke-gke-local-ssd-default-pool-dd2b0b02-8twd   Created   Unknown       0              0            0
Copy
View Volumes and their Status
kubectl puls8 mayastor get volumes
Copy
Sample Output
ID                                    REPLICAS  TARGET-NODE                                   ACCESSIBILITY  STATUS    SIZE  THIN-PROVISIONED  ALLOCATED  SNAPSHOTS  SOURCE 
fa486a03-d806-4b5c-a534-5666900853a2  3         gke-gke-local-ssd-default-pool-dd2b0b02-08cs  nvmf           Degraded  5GiB  false             5GiB       0          <none>

Recovery Steps

Reapply node-level configurations:

  1. Reconfigure HugePages
  2. Load nvme_tcp module
  3. Recreate a new DiskPool with a new name.
Copy
Apply Updated Pool Configuration after Node Recovery
kubectl apply -f pool-4.yaml
Copy
Sample Output
diskpool.openebs.io/pool-4 created

Once the pool is created, the degraded volume is back online after the rebuild.

Benefits of Using DataCore Puls8 with GKE

  • High Availability: Data is synchronously replicated across multiple nodes, ensuring availability even if a node or disk fails.
  • Performance Optimization:DataCore Puls8 leverages GKE’s local SSDs, which offer high IOPS and low latency - ideal for performance-intensive applications such as databases and analytics.
  • Fast Recovery: In the event of node failure, replica rebuilding ensures rapid restoration of full redundancy with minimal administrative intervention.
  • Kubernetes-Native Operations: Seamless integration with Kubernetes through CSI drivers and custom resources allows declarative, consistent, and scalable storage operations.

Learn More