Automatic Self-Healing

The advanced mirror capabilities of Dynamic Data Resiliency can self-heal the 3-copy virtual disk manually or automatically without intervention and reduce recovery time by maintaining data redundancy to avoid full recoveries. Automatic self-healing is enabled by default.

Automatic self-healing will act under the following conditions:

  • A DataCore Server stops or restarts unexpectedly (crash) or is gracefully shutdown (with drivers stopped).
  • Virtualization (the DataCore Executive Service) is stopped on a DataCore Server.
  • The back-end storage fails or becomes unavailable.

How It Works

Automatic self-healing automatically restores high availability to a host when one of the active storage sources in a 3-copy virtual disk fails or becomes unavailable. When DataCore SANsymphony detects a failure of an active storage source that is served to a host, a configurable timer starts. When the delay time is reached, the failed storage source is automatically substituted with the additional copy which is promoted to active status and host paths to it are enabled.

When the virtual disk is healthy, data writes to all copies of the virtual disk are synchronous, meaning that a write I/O is completed to the host when mirrored to all storage sources. DataCore SANsymphony attempts to maintain up-to-date data on all storage sources. The host has access to the data copies from, at most, two DataCore Servers.

Self-healing requires that a virtual disk receive writes, and therefore changes the status to Log recovery pending, before it will act. For example, when a server is stopped with the storage source offline, there are no writes to that virtual disk, and self-healing will not happen. At the point when writes are received and cannot be mirrored to the offline storage source, self-healing will act at the set interval from that time.

Redundancy can also be restored manually for 3-copy virtual disks, see Manual Redundancy Restoration.

Recommended Precautions

When using automatic self-healing, it is important to reduce the risks in case of simultaneous loss of mirror and communication paths. In the event that all mirror and management communication paths are lost, host access could continue to both storage sources with unpredictable results. The following precautions must be taken:

  • Mirror paths and management communication paths should be on independent communication infrastructures.

  • A witness is required to determine host accessibility to servers in the event that mirror and management communications paths become unavailable.

    • If a witness is configured, servers that are unable to contact the witness will deny host access to the virtual disk on that server. See Witness for more information.
    • If a witness is not configured, after the virtual disk is healed, inactive storage sources will stay in a log recovery pending state. To recover mirror synchronization, the administrator will have to split all storage sources, determine which server has the best data, then remirror the virtual disks as appropriate. Full recoveries are required to synchronize data on the new mirrors.