High Availability

Explore this Page

Overview
Nexus Switch-Over
Configuration Requirements for Optimal HA
Disabling HA (Not Recommended)
Benefits of HA

Overview

High Availability (HA) is a critical requirement for modern stateful applications that demand uninterrupted access to persistent storage. Replicated PV Mayastor introduces a robust HA architecture designed to ensure continuous data access and minimal downtime in the event of storage node or volume target failures.

This document outlines the HA capabilities, focusing on the nexus switch-over mechanism that enables automatic failover of volume targets. It explains the architecture, operational flow, configuration requirements, and considerations necessary for achieving reliable and efficient failover handling.

Nexus Switch-Over

DataCore Puls8 improves volume target HA through nexus switch-over. In the event of a target failure, it rapidly detects the failure and provisions a new nexus instance to maintain I/O operations with minimal interruption.

The HA mechanism comprises two key components:

HA Node Agent: Runs on every CSI node.
Cluster Agent: Runs alongside the core agent (agent-core).

The HA Node Agent continuously monitors the I/O path between applications and their associated targets. If it detects a failure in this path, it notifies the Cluster Agent, which then initiates the creation of a new target on a healthy node. After the new target is successfully created, the Node Agent establishes a new connection between the application and this target.

This process typically completes within seconds, ensuring seamless recovery and negligible downtime.

The volume must have more than one replica to enable the switch-over functionality.

Configuration Requirements for Optimal HA

To ensure optimal functioning of HA in Replicated PV Mayastor environments, consider the following configuration guidelines:

Applications scheduled on nodes with the label openebs.io/engine=mayastor will have their Nexus preferably co-located on the same node. However, if the io-engine pod on that node becomes unhealthy, the Nexus may be created on an alternative healthy node.
The following kernel parameter is required to enable HA:

Copy

Set the kernel parameter to enable NVMe multipath functionality for HA support

nvme_core.multipath=Y

Without this configuration, volume target failover is not supported.

HA is not supported on RHEL 10 and its derivative distributions, as these platforms have deprecated the nvme_core.multipath kernel parameter required for enabling HA functionality. If you are planning to deploy HA in your environment, ensure that you are using an operating system that supports nvme_core.multipath.

Disabling HA (Not Recommended)

It is strongly recommended to keep the HA feature enabled for production environments.

By default, HA is enabled in DataCore Puls8. However, if there is a need to disable this feature, it can be done during Helm installation using the following parameter:

Copy

Helm parameter to disable the High Availability feature during installation

--set=openebs.mayastor.agents.ha.enabled=false

Benefits of HA

Minimizes downtime by enabling fast failover of volume targets.
Ensures uninterrupted I/O for stateful workloads during infrastructure failures.
Automatically recovers from storage path failures within seconds.
Improves application resilience without requiring manual intervention.

Learn More