Dynamic Data Resiliency

In this topic:

About Dynamic Data Resiliency

Creating 3-copy virtual disks with dynamic data resiliency

Automatic self-healing

Manual redundancy restoration

 

About Dynamic Data Resiliency

The Dynamic Data Resiliency feature provides extended protection (resilience) against failure scenarios by increasing availability and reducing periods without redundancy by adding an additional data copy to mirrored virtual disks. The added copy receives data from the active virtual disk storage source, but the front-end paths from the added copy to the host are disabled and access is not allowed until it is promoted to active status. In the case of a server or storage source failure, high availability to the host can be manually or automatically restored using the additional storage source. The feature also assists administrators by allowing them to move the mirrored data in order to perform maintenance duties in a simple and efficient manner without temporarily compromising redundancy or the need for shared storage sources.

A mirrored virtual disk (comprised of two data copies) loses data redundancy when one of the storage sources becomes unavailable which can leave an application exposed to possible complete loss of access to its data if the second storage source also fails. In this case, restoring high availability requires time to determine the problem, address the cause, take action to correct and then rebuild the failed copy with a log or full recovery as appropriate. Losing both storage server nodes results in complete loss of access to the data with a mirrored virtual disk comprised of two data copies.

Benefits

The Dynamic Data Resiliency feature provides the following solutions:

o           Scale out architecture - the number of data copies for a virtual disk can be increased by adding DataCore Servers with a storage pool to the server group configuration.

·            Extended data redundancy or "data resiliency" for mirrored virtual disks is achieved by adding an extra data copy from another DataCore Server storage source to create three synchronous copies of the data, referred to as a 3-copy virtual disk.

o           Self-healing capabilities - the additional data copy allows any of the storage sources in a mirrored virtual disk to move from one DataCore Server  to another and dynamically transition the workload.

·            Mirrored virtual disks provide host access from two DataCore Servers. Virtual disks with more than two copies can maintain, either automatically or manually, two active storage paths to the host if a storage source failure should occur to any one of the DataCore Servers. In this case, no full mirror recovery is required. See Automatic Self-healing.

·            High availability to the host can be maintained with a 3-copy virtual disk when a server must be taken offline for server maintenance or when a server must be permanently removed from the configuration. See Maintenance Mode (Evacuate/Redistribute).

o           Load balancing - the additional data copy allows storage sources in a synchronously mirrored virtual disk to be dynamically redistributed among other servers in the server group while maintaining host access and without the need for shared storage access.

Workload can be redistributed among other servers in the group:

·            Among other servers in the group when servers are added or removed from the group.

·            To other servers with more storage resources as pools on overloaded servers reach capacity.

·            To other servers prior to and after maintenance.

Important Notes

o           Dynamic data resiliency requires a minimum of three servers each with an available pool of storage in the same server group in order to create three data copies of the virtual disk on different servers.  See Creating 3-Copy Virtual Disks.

o           All servers with storage sources in the same multi-copy virtual disk must have mirror ports with connections to the other servers in the same virtual disk. Additionally, those servers must have front-end ports with connections to the same hosts to which the virtual disk is served or could be served. An adequate number of ports to create redundant paths is recommended.

o           A multi-copy virtual disk has front-end paths between the host and all storage sources. However, host access is available to no more than two DataCore Servers at a time. Storage sources that are accessible to the host or are capable of being accessible to the host (if not served) are considered "active" storage sources. The storage source that has host access disabled is considered an "inactive" storage source. Host access may change among the storage sources in the virtual disk to maintain high availability as various failure conditions or operations are handled.

o           Automatic self-healing is enabled by default. See Automatic Self-healing.

o             When automatic self-healing is enabled, it is very important to reduce the risks in case of simultaneous loss of mirror and communication paths. In the event that all mirror and management communication paths are lost, host access could continue to both storage sources with unpredictable results. The following precautions must be taken:

·            Mirror paths and management communication paths should be on independent communication infrastructures.

·            Configuring a witness is required to determine host accessibility to servers in the event that mirror and management communications paths become unavailable. If a witness is configured, servers that are unable to contact the witness will deny host access to the virtual disk on that server. See Witness for more information.
Note: If a witness is not configured, after the virtual disk is healed, inactive storage sources will stay in a log recovery pending state. To recover mirror synchronization, the administrator will have to split all storage sources, determine which server has the best data, then remirror the virtual disks as appropriate. Full recoveries are required to synchronize data on the new mirrors.

o           All storage sources in a multi-copy virtual disk are displayed in the Virtual Disk Details page>Info tab and in the Summary when the virtual disk is selected in the Virtual Disk List.

o           A Shared Multi-port Array license is not required for dynamic data resiliency, nor are shared pools. Dynamic data resiliency is not supported for dual virtual disks.

o           Most features are supported with multi-copy virtual disks. Deviations from the usual behavior of 2-copy (mirrored) virtual disks are noted below:

·            The Evacuate and Redistribute operations alter host access between storage sources in the 3-copy virtual disk to maintain high availability while performing maintenance procedures on a server. During these operations, the three storage sources remain associated with the virtual disk, only paths are adjusted for the new storage source. See Maintenance Mode (Evacuate/Redistribute) for more information.

·            The Create Virtual Disk wizard and virtual disk templates do not currently support 3-copy virtual disks. See Creating 3-Copy Virtual Disks for instructions.

·            The Move operation cannot be performed on the inactive storage source in a 3-copy virtual disk, but it can be replaced, see Replacing/Moving a Storage Source in a Virtual Disk. The inactive storage source in a 3-copy virtual disk also cannot be moved using the evacuate or redistribute operation, see Maintenance Mode.

·            When creating a replication, the server for the destination virtual disk cannot be the inactive storage source in a 3-copy virtual disk since host access is disabled.

·            The inactive storage source in a 3-copy virtual disk must be split and unserved in order to perform the purge operation when the source disk for the inactive storage source is affected.

·            The inactive storage source in a 3-copy virtual disk cannot be forced online. The inactive storage source will subsequently perform a full recovery after the active storage sources have been restored.

·            Mirror recovery of the inactive storage source in the 3-copy virtual disk is linked to an active storage source and cannot be paused.

·            3-copy virtual disks are not currently supported for use as VVOLs.

·            When the Auto select option is selected as the Preferred Server setting for a host, the software will select one of the active storage sources as the Preferred Server and the selection can change. To ensure that a particular server is selected, it must be expressly selected at the host level as a preferred server or as preferred paths at the virtual disk level.

Creating 3-Copy Virtual Disks

3-copy virtual disks can be created these ways:

o           In the host context, using the Create Virtual Disk operation to create and serve a virtual disk with a redundancy level of three. This creates three data copies of the virtual disk. See Quickly Creating and Serving Virtual Disks for instructions.

o           Creating a mirrored virtual disk (with 2 data copies) using the wizard and adding another mirror. See Adding Mirrors for instructions.

o           Using the Create Another operation to create more virtual disks using a 3-copy virtual disk as the basis. See Create Another for instructions.

The virtual disk will appear in the DataCore Servers Panel under each server that is included as a storage source. Storage source details are displayed in the Virtual Disk Details page under the Info tab and in the Virtual Disks List under the Virtual Disk Summary.

Removing Mirrors

Mirrors can be removed from multi-copy virtual disks using the Split and Unserve operation. See Splitting Virtual Disks to Remove Mirrors.

 

Automatic Self-healing

The advanced mirror capabilities of the Dynamic Data Resiliency feature can self-heal the 3-copy virtual disk manually or automatically without intervention and reduce recovery time by maintaining data redundancy to avoid full recoveries. This feature includes the ability to automatically restore high availability to a host when one of the active storage sources in a 3-copy virtual disk fails or becomes unavailable. When the software detects a failure of an active storage source that is served to a host, a configurable timer starts. When the delay time is reached, the failed storage source is automatically substituted with the additional copy which is promoted to active status and host paths to it are enabled.

When the virtual disk is healthy, data writes to all copies of the virtual disk are synchronous, meaning that a write I/O is completed to the host when mirrored to all storage sources. The software attempts to maintain up-to-date data on all storage sources. The host has access to the data copies from, at most, two DataCore Servers.

Automatic self-healing, when enabled, will act under the following conditions:

o           A DataCore Server stops or restarts unexpectedly (crash) or is gracefully shutdown (with drivers stopped).

o           Virtualization (DCSX service) is stopped on a DataCore Server.

o           The back-end storage fails or becomes unavailable.

 Self-healing requires that a virtual disk receive writes, and therefore changes the status to Log recovery pending, before it will act. For example, when a server is stopped with the storage source offline, there are no writes to that virtual disk, and self-healing will not happen. At the point when writes are received and cannot be mirrored to the offline storage source, self-healing will act at the set interval from that time.

Redundancy can also be restored manually for 3-copy virtual disks, see Manual Redundancy Restoration.

Enabling/Disabling Automatic Self-healing

Automatic self-healing and the self-healing delay timer are server group settings. The setting is enabled by default.

The self-healing delay timer can be set to a value to allow for possible outages of a temporary nature. In the console, the delay value may be set from a minimum of one minute to a maximum of 1,000 minutes. The default is eight minutes.

Using the DataCore Cmdlet Set-DcsServerGroupProperties, the minimum self-healing delay value that can be set is five seconds. When set to five seconds, the value in the console is displayed in minutes and therefore will be reflected as zero.

 Important Notes

o           A Witness must be configured for use with automatic self-healing in the event that virtual disks lose all mirror access between storage sources and management communication paths.  See Witness.

o           If the setting is disabled after the timer has started, but before the restoration process has begun, the self-healing will be cancelled and the timer is stopped. If the setting is disabled after the restoration process has begun, the process will not be stopped.

  To enable and set the delay:

1            In the Server Group Details page>Settings tab, under Advanced Settings, select the Automatic Self-healing Enabled check box. This enables the delay timer settings.

2           In Self-healing delay, keep the default value or enter the number of minutes required.

3           Click Apply.

To disable:

1            In the Server Group Details page>Settings tab, under Advanced Settings, clear the Automatic Self-healing check box. (This disables the delay timer settings.)

2           Click Apply.

Manual Redundancy Restoration

Redundancy to hosts can be manually restored instead of automatically restored. Redundancy can be restored at the server group or virtual disk level.

To manually restore redundancy for 3-copy virtual disks:

In the Virtual Disks List, select one or more 3-copy virtual disks, right-click and select Restore Redundancy from the context menu.

The operation is also available from context menus in the DataCore Servers Panel, Hosts Panel, and in Virtual Disks tabs in Details pages.

To manually restore redundancy for a server group:

In the DataCore Servers Panel, right-click on the server group and click Restore Redundancy from the context menu.

Notes on Automatic Self-healing and Manual Redundancy Restoration

o           These operations require one active storage source and the inactive storage source in a 3-copy virtual disk to be healthy and online in order to swap the active paths to the inactive storage source. The operations are only performed on virtual disks capable of performing them.

o           These operations can be performed without a full recovery provided that the inactive storage source in the virtual disk is healthy and up-to-date.

o           Automatic self-healing is only performed for 3-copy virtual disks that are served to hosts; virtual disks that are not served to hosts can be manually restored if needed.

o           An inactive storage source in a 3-copy virtual disk cannot be promoted to an active storage source when used as the source of a replication.

o           After the operation is performed and if the failed storage source is permanently failed or unavailable, that storage source may be split to remove it and a new mirror added to make a 3-copy virtual disk fully functional again.  

o           Snapshots and rollbacks do not get moved during this operation, but remain on the server where they were created.

o           Mirror recovery can be monitored in the Virtual Disks List in the Summary, or in the Virtual Disk Details page in the Info tab.