Mirror Recovery
In this topic:
Monitoring mirror recovery and data status
Pausing and resuming mirror recovery for virtual disks
Controlling mirror recovery speed for the server group
Forcing online mirrored virtual disks
Also see:
Virtual Disks for information about virtual disks and write caching
Mirroring
A mirrored virtual disk is created from storage sources (disk pool or pass-through disk) from two DataCore Servers in the same server group. A 3-copy virtual disk is created from storage sources (disk pool or pass-through disk) from three DataCore Servers in the same server group. Virtual disks with multiple storage sources (or data copies) are referred to as multi-copy virtual disks. A multi-copy virtual disk has front-end paths between the host and all storage sources. However, host access is available to no more than two DataCore Servers at a time; these are referred to as the "active" storage sources in the multi-copy virtual disk. The storage source with disabled host access is referred to as the "inactive" storage source. See Dynamic Data Resiliency for more information.
Data is "synchronously” mirrored between the storage sources in a virtual disk, meaning that the write acknowledgement is not returned to the requesting storage host until the data has been written to all DataCore Servers. Mirroring is the process of keeping the virtual disk data identical on all servers.
Mirror Recovery
Mirror recovery is the resynchronization of virtual disk data between storage sources in a multi-copy virtual disk. Virtual disks need mirror recovery when an event occurs that prohibits the processing of data on a DataCore Server; in which case the data is not "up-to-date" on that storage source. If there remains another active storage source that can receive host data, the data changes are recorded in a log on that server. Once the issue is resolved, recovery begins to resynchronize the data between the storage sources so that when recovery is complete, all copies of the virtual disk data on DataCore Servers are identical.
Mirror recovery is either full or log-based.
Log Recoveries
A log recovery resynchronizes only the data that has changed on the not "up-to-date" storage source since mirroring stopped to that storage source. A log recovery takes considerably less time than a full recovery.
For example, a log recovery could happen if I/O is running on Server1 and the mirror connection to Server2 goes down, leading to what is commonly referred to as "mirror down" on Server1. A log recovery is needed from Server1 to Server2 to synchronize a mirrored virtual disk.
When virtualization is stopped on a server, SANsymphony software flushes the DataCore Server cache and all write operations are forced to the active mirror partner, which starts to log data changes in memory. When virtualization is restarted, a log recovery will begin and the log is used to resynchronize the virtual disk data.
The log used for recovery is held in memory on the server with active host access as long as virtualization is running.
If virtualization is stopped on the active mirror and a logstore pool is assigned, the log is saved to the "logstore" on that server. When virtualization is restarted, the logstore will be used to perform a logged recovery. If that log recovery is interrupted for any reason while virtualization is running, the log recovery will resume and include any new data changes that may have occurred in the interim.
If a logstore pool is not assigned on the remaining server with host access, the log will be lost and those virtual disks that require log recoveries will undergo full recoveries to resynchronize the data. Assigning a logstore on all servers is important to prevent full recoveries in this scenario, see Logstore for more information.
Full Recoveries
A full recovery synchronizes all storage allocation units (SAUs) in the virtual disk storage source regardless of which SAUs may have changed. Full recovery may be required when virtualization cannot be stopped cleanly or SANsymphony software cannot flush the cache to the storage source before a server is stopped. The next time virtualization is started on that server, a full recovery may be performed to ensure that all host I/O that was held in the DataCore Server cache or not successfully destaged will be copied from the up-to-date storage source. If a full recovery is interrupted while it is running, for example one node is stopped, or the mirror paths go unavailable, then once this has been resolved the recovery will restart from the beginning.
If there was no host I/O in the server cache at the time when the DataCore Server was unexpectedly shutdown, SANsymphony will detect this and a full recovery will not be needed.
For example, a full recovery could happen if I/O is running on Server1 and Server2 loses power and becomes temporarily unavailable. A full recovery from Server1 to Server2 would be required when Server2 becomes available again to synchronize a mirrored virtual disk.
If both servers were not stopped cleanly at the same time (i.e. power failure), virtualization will have to be manually restarted on both servers and one storage source (the "last known good") will have to be chosen to initiate the full recovery. That storage source will have to be forced online in order for the full recovery to begin. See Forcing Mirror Recovery for more information.
When an event occurs that prohibits the processing of data on a storage source, the virtual disk is automatically put in write-through mode until the issue is resolved and the virtual disk data has been fully synchronized,
Recovery Priority
Mirror recovery can be prioritized in the storage profile for each dual and mirrored virtual disk. See Storage Profiles for more information.
Recovery Priority Settings (from highest to lowest priority)
- Critical
- High
- Regular (default setting)
- Low
Recovery Behavior
- Recovery is allocated based on the number of virtual disks needing recovery and the resources available.
- Generally, virtual disks of higher priority should recover before virtual disks of lower priority. However, virtual disks of lower priority may begin recovery if they are ready to recover before virtual disks of higher priority that are not ready to recover. Virtual disks of a lower priority may complete recovery before virtual disks of a higher priority if the former contain much less data to be recovered.
- The number of virtual disks in recovery is affected by the Mirror Recovery Speed server group setting which controls the overall mirror recovery traffic for all servers in a server group and the Auto-recovery virtual disk setting which suspends automatic recoveries for selected virtual disks.
- Mirror recoveries do not occur on dual virtual disks unless configured for replication and only if the active replication server becomes unavailable.
Logstore
Full recoveries require considerably more time than logged recoveries to synchronize data. The logstore feature can optimize mirror recoveries by eliminating full mirror recoveries under certain circumstances.
Data changes from hosts are logged to an in-memory bitmap on the active mirror partner under certain conditions, such as when a write operation on a mirror path fails (known as a "Mirror Down" condition) or an I/O operation to the back-end storage fails (known as a "Local Down" condition). If virtualization is then stopped on that active mirror partner, the bitmap is saved on that server as long as a logstore pool is assigned for it. The logstore is saved immediately before virtualization is stopped due to either a Stop DataCore Server command, or a "controlled" Windows operating system shutdown or restart.
When virtualization is running again on both servers, the logstore with saved data changes is restored on the server where it was saved. This allows a logged recovery to be performed from that server on those virtual disks requiring recovery, instead of a full recovery.
Without a logstore, the in-memory bitmap is lost if virtualization is stopped on the active mirror partner and full recoveries would be performed on those virtual disks that require mirror recovery.
A logstore is a hidden virtual disk and cannot be viewed in the configuration. The logstore pool is assigned per server, and can be displayed, set, or disabled in the DataCore Server Details page>Settings tab. See Setting a Logstore Pool below.
- The best practice is to create a dedicated logstore pool for each server in the server group. In configurations where a server has a single pool and that pool fails, the logstore will not be saved and full recoveries would be necessary. When multiple pools exist on the server and the logstore pool fails, change the logstore to another pool before stopping the server. (A shared pool can also be assigned as a logstore pool.)
- When a logstore pool is assigned, data changes are saved to the logstore at the time when virtualization is stopped. Data changes are not saved to the logstore if the server logging data changes restarts unexpectedly or the logstore is unavailable for any reason (for instance due to a physical disk or pool failure). In these cases, full recoveries will be performed for those virtual disks that require mirror recovery.
- The logstore is created with a logical size of 256 GB, which will be added to the oversubscription size of the pool, although the size in use may be much less. The logstore actually uses a maximum of 128 MB per virtual disk. Ensure there is enough free space in the pool to accommodate all multi-copy virtual disks. If the logstore runs out of space, the logstore will not be created and an alert will be posted. In this case, full recoveries will be performed for those virtual disks that require mirror recovery.
- Initially, by default, if multi-copy virtual disks exist on a server and a logstore pool is not specifically set, the pool with the most available space will be automatically assigned as the logstore pool.
- If the logstore pool is deleted, the pool with the most available space will be automatically assigned as the new logstore pool, unless the logstore pool is set to No Logstore Pool in the DataCore Server settings. See Setting a Logstore Pool.
- The logstore can be changed or removed while the server is stopped. In the case of a logstore change, full recoveries will be performed for those virtual disks that require mirror recovery since the new logstore was not saved before the server was stopped.
- If the logstore is changed while the pool is disconnected or the logstore is unavailable, it is possible that when the pool is brought back, the original logstore will become an auto-generated virtual disk named "Logstore <server name>, and will be visible in the configuration. This logstore will be invalid and will not be used.
- Logstores are not created unless multi-copy virtual disks exist on the server.
Setting a Logstore Pool
To save data changes to a logstore on a server, a logstore pool must be assigned for that server.
The initial default setting is Auto Select Logstore, which allows the software to select the logstore pool based on the pool with the most available space. Once a different selection has been made, Auto Select Logstore is not available for selection by the user.
The software automatically resets to Auto Select Logstore when a previously selected logstore pool is later deleted from the configuration and no other logstore pool has been assigned. In this case, the logstore is automatically assigned until the user chooses another pool.
The selection No Logstore Pool disables the feature by preventing a logstore from being saved on the server. This setting will not optimize mirror recoveries.
To set the logstore pool:
- In the DataCore Server Details page>Settings tab, expand General Settings.
- In Logstore pool, use the pull-down menu to select the pool to use for the logstore.
- Click Apply. Set the logstore pool for each server in the server group.
Monitoring Mirror Recovery
Mirror recovery can be monitored from:
- In Virtual Disks List, in the Status column. When the virtual disk is selected from the list, individual status for storage sources in a mirrored virtual disk can be seen in the Data status field in the Virtual Disk Summary area.
- In the Virtual Disk Details page under the icon. Individual status for storage sources in a mirrored virtual disk can be seen in the Data status field in the Info tab.
- In the Status column in the Virtual Disks tab of any details page.
See Virtual Disk Status for descriptions of general overall virtual disk status, as well as the status of individual storage sources that comprise a virtual disk.
Pausing/Resuming Mirror Recovery for Virtual Disks
Mirror recoveries can be paused and resumed to mitigate the traffic caused by recovery I/O. The Pause/Resume Recovery feature enables virtual disks in recovery to be paused to optimize host performance when necessary. Recovery can be resumed at a time when the I/O has a lesser impact on performance. The recovery can remain paused for any length of time. When recovery is resumed, it is resumed from the point where it was paused.
Auto-recovery can be disabled for selected virtual disks to prevent automatic mirror recovery. When disabled, mirror recovery will not begin until it is manually started and will only start the current recovery. Each subsequent recovery must be manually started until the auto-recovery setting is enabled. The setting takes immediate effect. When auto-recovery is disabled while in recovery, that recovery will be paused. When auto-recovery is enabled, a paused recovery will be automatically resumed. Auto-recovery is enabled by default.
- Pausing recovery does not affect host access. Host I/O sent to the DataCore Server will be processed as usual. When mirror recovery is paused, host I/O continues to be mirrored as usual, as long as all storage sources of the mirror and the required mirror paths are available, even if the virtual disk status is not up-to-date.
- When auto-recovery for a virtual disk is disabled and that virtual disk is split, the default setting (enabled) will be applied on both sides.
- When a virtual disk is manually paused or has auto-recovery disabled, and mirror recovery is needed for the virtual disk, the virtual disk status will reflect Log recovery paused or Full recovery paused.
To pause or resume the current recovery of virtual disks:
Also follow these steps to resume a current recovery that is paused because auto-recovery is disabled.
- In the Virtual Disk Details page>Info tab, click Pause or Resume to the right of the recovery progress bar.
Alternatively, this command can be performed for virtual disks or virtual disk groups from a context menu in a panel or Virtual Disks List and the setting is available in the Virtual Disk Details page>Settings tab.
- If paused, the virtual disk status will reflect that the recovery is paused. The virtual disk status is displayed on Virtual Disks Lists or the Virtual Disk Details page.
To enable or disable auto-recovery for virtual disks:
- In a panel or list, right-click on the virtual disks or virtual disk groups and point to Recovery, then select Enable Auto-recovery or Disable Auto-recovery.
(Alternatively, the option can be initiated from the Virtual Disk details page in the Settings tab.)
- If auto-recovery is enabled, "recovery enabled" will be appended to the status. The virtual disk status is displayed on Virtual Disks Lists or the Virtual Disk Details page.
Controlling Mirror Recovery Speed for the Server Group
The Mirror Recovery Setting is a global server group setting that controls the overall speed used for mirror recovery traffic for all servers in a server group. This setting can increase, decrease, or suspend mirror recoveries for all servers in the group. The throttling feature is useful when throughput would be better spent performing activities other than mirror recovery, such as to accelerate processing of heavy I/O from hosts. The mirror recovery speed can be increased when host I/O is light in order to speed up mirror recoveries when it will not affect the processing of host I/O.
- Increasing the mirror recovery speed may decrease the transfer of I/O from hosts.
- Setting the mirror recovery speed to No Recovery will suspend all mirror recoveries for all DataCore Servers in the server group.
- The system distributes recovery in the optimal manner based on the recovery setting in the storage profile of virtual disks. Virtual disks with higher priority recovery are given more of the throughput than those with lower priority. If the recovery setting is reduced while recoveries are in progress, virtual disks that are already in recovery will remain in recovery until complete, but the throughput for that virtual disk will be reduced accordingly.
- Changing the speed of mirror recovery may also change the amount of simultaneous recoveries that occur on each server in the server group.
A slide bar is used to change the setting. Five speed settings are indicated on the bar: No Recovery, Normal recovery (default setting), Maximum recovery, and midrange recovery settings between each of the labeled settings. For instance, to double the normal throughput rate set the recovery to Max. To decrease the normal throughput rate by 50%, set the speed to the midpoint between no recovery and normal recovery. Set the recovery to No Recovery to suspend the recovery for all servers in the server group.
For example, if all virtual disks are in recovery using the normal speed and assuming that all recoveries will finish in four hours, then if the recovery speed is increased to the maximum setting, all virtual disk recoveries will finish in two hours.
To change mirror recovery speed for all servers in a server group:
- In the Server Group Details page>Settings tab, expand Advanced Settings.
- Use the Mirror recovery speed slider bar to change the setting.
- Click Apply. The setting takes effect immediately.
Forcing Mirrored Virtual Disks Online
A multi-copy virtual disk that displays a virtual disk status of Unknown or Double failure indicates that both servers with active storage sources to the host have suffered simultaneous or concurrent failures and the storage sources are no longer in synchronization which could put data at risk. When this occurs, the host will not have access to the virtual disk.
When necessary, the Force online operation can be performed to force recovery for virtual disks using the selected storage source. When recovery is forced, data initialization (re-synchronization) occurs from the DataCore Server whose storage source is forced online and the data on that storage source is mirrored to mirror partner (if the server is operational).
- Forcing a storage source to recover involves risks and should be used with extreme caution. Recovering from the incorrect storage source will result in data loss. Before forcing the storage source online, check the Windows Event Logs on the host to determine which DataCore Server was actively servicing I/O requests at the time the failure occurred. Also, check the allocated storage size on both DataCore Servers before making the decision. In some cases, it may not be correct to force online the storage source on the last server to go down; the server with the most SAUs allocated may determine which DataCore Server should be used to recover. A safe alternative to forcing online is to split the mirror and carefully examine the storage sources, then remirror with the optimal storage source.
If unsure of how to proceed, contact DataCore Technical Support for assistance.
- When mirror paths are unavailable, virtual disks cannot be forced online. In this case, the virtual disk can be split to gain access to storage sources, although high-availability will be lost. Use caution in determining which side of the split virtual disk to use, then a mirror may be added to regain high-availability. A full recovery will be performed to synchronize the data.
- The Force online option is only accessible when required; otherwise, the option is hidden.
- Recovery can be forced for one virtual disk or all virtual disks on a DataCore Server. There is one important difference between forcing one virtual disk online and performing the same operation for all virtual disks on a server. The Force Online operation can only be performed for a server when virtual disks are offline, not when virtual disks are in double failure.
To force online one virtual disk:
- In the Virtual Disk Details page>Info tab, determine the correct DataCore Server from which to initiate the operation.
(Alternatively, virtual disk operations can be performed from the Virtual Disks List.)
- In the Host access field of the storage source to initiate recovery from, click the Force online link.
- Click OK in the confirmation box to continue.
- Verify the recovery is taking place and ensure the host has access to the virtual disk again.
To force online all virtual disks on a DataCore Server:
This operation will cause the recovery of all virtual disk storage sources on the selected DataCore Server.
- In the DataCore Servers Panel, right-click on the correct DataCore Server from which to initiate the operation and select Force virtual disks online.
- Click OK in the confirmation box to continue.