High Availability
Explore this Page
- Overview
- Nexus Switch-Over
- Configuration Requirements for Optimal HA
- Disabling HA (Not Recommended)
- Benefits of HA
Overview
High Availability (HA) is a critical requirement for modern stateful applications that demand uninterrupted access to persistent storage. Replicated PV Mayastor introduces a robust HA architecture designed to ensure continuous data access and minimal downtime in the event of storage node or volume target failures.
This document outlines the HA capabilities, focusing on the nexus switch-over mechanism that enables automatic failover of volume targets. It explains the architecture, operational flow, configuration requirements, and considerations necessary for achieving reliable and efficient failover handling.
Nexus Switch-Over
DataCore Puls8 improves volume target HA through nexus switch-over. In the event of a target failure, it rapidly detects the failure and provisions a new nexus instance to maintain I/O operations with minimal interruption.
The HA mechanism comprises two key components:
- HA Node Agent: Runs on every CSI node.
- Cluster Agent: Runs alongside the core agent (agent-core).
The HA Node Agent continuously monitors the I/O path between applications and their associated targets. If it detects a failure in this path, it notifies the Cluster Agent, which then initiates the creation of a new target on a healthy node. After the new target is successfully created, the Node Agent establishes a new connection between the application and this target.
This process typically completes within seconds, ensuring seamless recovery and negligible downtime.
The volume must have more than one replica to enable the switch-over functionality.
Configuration Requirements for Optimal HA
To ensure optimal functioning of HA in Replicated PV Mayastor environments, consider the following configuration guidelines:
- Applications scheduled on nodes with the label
openebs.io/engine=mayastor
will have their Nexus preferably co-located on the same node. However, if theio-engine
pod on that node becomes unhealthy, the Nexus may be created on an alternative healthy node. - The following kernel parameter is required to enable HA:
nvme_core.multipath=Y
Without this configuration, volume target failover is not supported.
HA is not supported on RHEL 10 and its derivative distributions, as these platforms have deprecated the nvme_core.multipath
kernel parameter required for enabling HA functionality. If you are planning to deploy HA in your environment, ensure that you are using an operating system that supports nvme_core.multipath
. Attempting to enable HA on unsupported platforms may lead to unexpected behaviour and failure of volume failover mechanisms.
Disabling HA (Not Recommended)
It is strongly recommended to keep the HA feature enabled for production environments.
By default, HA is enabled in DataCore Puls8. However, if there is a need to disable this feature, it can be done during Helm installation using the following parameter:
--set=openebs.mayastor.agents.ha.enabled=false
Benefits of HA
- Minimizes downtime by enabling fast failover of volume targets.
- Ensures uninterrupted I/O for stateful workloads during infrastructure failures.
- Automatically recovers from storage path failures within seconds.
- Improves application resilience without requiring manual intervention.
Learn More