Automatic Self-Healing
Explore this Page
Overview
Automatic Self-Healing helps maintain high availability and redundancy for 3-Way Virtual Disks when an active storage source becomes unavailable. This page explains how the feature works, when it is triggered, and how to enable, disable, and configure it.
Automatic Self-Healing Trigger Conditions
The advanced mirror capabilities of Dynamic Data Resiliency can self-heal the 3-Way Virtual Disks manually or automatically without intervention and reduce recovery time by maintaining data redundancy to avoid full recoveries. Automatic self-healing is enabled by default.
Automatic self-healing will act under the following conditions:
- A DataCore Server stops or restarts unexpectedly (crash) or is gracefully shutdown (with drivers stopped).
- Virtualization (the DataCore Executive Service) is stopped on a DataCore Server.
- The back-end storage fails or becomes unavailable.
How It Works
Automatic self-healing automatically restores high availability to a host when one of the active storage sources in a 3-Way Virtual Disks fails or becomes unavailable. When DataCore SANsymphony detects a failure of an active storage source that is served to a host, a configurable timer starts. When the delay time is reached, the failed storage source is automatically substituted with the additional copy which is promoted to active status and host paths to it are enabled.
When the virtual disk is healthy, data writes to all copies of the virtual disk are synchronous, meaning that a write I/O is completed to the host when mirrored to all storage sources. DataCore SANsymphony attempts to maintain up-to-date data on all storage sources. The host has access to the data copies from, at most, two DataCore Servers.
Redundancy can also be restored manually for 3-Way Virtual Diskss, see Manual Redundancy Restoration.
Recommended Precautions
When using automatic self-healing, it is important to reduce the risks in case of simultaneous loss of mirror and communication paths. In the event that all mirror and management communication paths are lost, host access could continue to both storage sources with unpredictable results. The following precautions must be taken:
- Mirror paths and management communication paths should be on independent communication infrastructures.
- A witness is required to determine host accessibility to servers in the event that mirror and management communications paths become unavailable.
- If a witness is configured, servers that are unable to contact the witness will deny host access to the virtual disk on that server. See Witness for more information.
- If a witness is not configured, after the virtual disk is healed, inactive storage sources will stay in a log recovery pending state. To recover mirror synchronization, the administrator will have to split all storage sources, determine which server has the best data, then re-mirror the virtual disks as appropriate. Full recoveries are required to synchronize data on the new mirrors.
Enabling/Disabling Automatic Self-Healing
Automatic self-healing and the self-healing delay timer are server group settings. The setting is enabled by default.
The self-healing delay timer can be set to a value to allow for possible outages of a temporary nature. In the console, the delay value may be set from a minimum of one minute to a maximum of 1,000 minutes. The default is eight minutes.
Using the DataCore Cmdlet Set-DcsServerGroupProperties, the minimum self-healing delay value that can be set is 5 seconds. When set to 5 seconds, the value in the console is displayed in minutes and therefore will be reflected as zero.
- A Witness must be configured for use with automatic self-healing in the event that virtual disks lose all mirror access between storage sources and management communication paths. See Witness.
- If the setting is disabled after the timer has started, but before the restoration process has begun, the self-healing will be canceled and the timer is stopped. If the setting is disabled after the restoration process has begun, the process will not be stopped.
To enable and set the delay:
- In the Task Details page > Settings tab, under Advanced Settings, select the Automatic Self-healing Enabled check box. This enables the delay timer settings.
- In Self-healing delay, keep the default value or enter the number of minutes required.
- Click Apply.
To disable:
- In the Task Details page > Settings tab, under Advanced Settings, clear the Automatic Self-healing check box. (This disables the delay timer settings.)
- Click Apply.
Learn More