Best Practices for Deduplication (Post-Processing)

  • The maximum size of a deduplication pool is 64 TB.
  • DataCore recommends creating mirrored virtual disks using a storage source from two deduplication pools so that both sides of the mirror are deduplicated.
  • Information in the DataCore Deduplication Console is refreshed when the console is opened, deduplication pools are created, or schedules are updated. However, it is not automatically refreshed in real time. Before performing any operation or reviewing the data in the console, click Refresh to manually update the information.
  • Once the deduplication pool is created, do not add disks to the deduplication pool (except temporarily in the case of changing pool size, see Changing the Deduplication Pool Size (Post-Processing)). Do not add pool mirrors.
  • Creating a deduplication pool will create objects which will be visible in the SANsymphony Management Console and are identified as being "Internal Use" for the deduplication pool. Do not modify objects created and labeled as "Internal Use" or deduplication pools and deduplication tasks may fail.
  • Do not change the script files and tasks that are automatically generated for each deduplication pool. (See Automated Post-Processing Deduplication Tasks.)
  • Creating a deduplication pool will create an "internal use only" volume in Disk Management with the same name as the pool. Do not rename the volume, or change or remove the drive letter in Disk Management. (The first available drive letter will be assigned to the volume. A drive letter must be available for use in order to create a deduplication pool.)
  • Because the deduplication type is post-processing deduplication, making copies of data (such as adding a mirror to a single virtual disk or replacing a mirrored storage source) will consume the true amount of data and later be deduplicated.
  • The performance class and write-aware auto-tiering settings in storage profiles for virtual disks created from a deduplication pool (which consists of one disk) will have no effect because there will only be one tier in a deduplication pool.
  • Actual deduplication savings are realized in the SAUs of the storage source pool and the volume created from it. Only when an entire SAU is deduplicated will it be realized as free space in the storage source pool. The allocated storage space in the storage source pool will vary based on the SAU size of that pool and the number of contiguous blocks that are deduplicated. Some of the deduplicated space may be in SAUs that remain allocated to the volume but which are available for reuse with that volume only.
  • Event messages from the Data Deduplication service are found in Computer Management > System Tools > Event Viewer > Applications and Services Logs > Microsoft > Windows > Deduplication.
  • After a DataCore Server restarts or the server is shut down and restarted, it will take a brief moment for the deduplication pool status to go healthy (Running).
  • After replacing a server or adding/changing physical disks on a DataCore Server with deduplication pools, ensure that the drive letters originally assigned to deduplication volumes created for each deduplication pool remain the same. (To identify the drive letters, see the Disk Pool Details page for each deduplication pool in the SANsymphony Management Console. The drive letter is contained in the description.)
  • If a deduplication volume goes off-line in Disk Management, the deduplication pool created from it will be off-line. In this case, the volume must be remounted by running the task Internal Use - Mount Dedup VHD [#] for the correct volume. To identify the correct task to run for the volume, open Tasks in the SANsymphony Management Console. The task description contains the drive letter, deduplication pool name, and the DataCore Disk ID. (To check the status of volumes, see the DataCore Disk Details page for each DataCore Disk named "Internal Use For [deduplication pool name]" in SANsymphony Management Console, The status is displayed under the icon in the top left corner or under Disk Information in the Info tab. The Info tab also displays the Index number used in Disk Management.)
  • Do not run disk defragmentation software on deduplication volumes used to create deduplication pools or volumes created from deduplication pools.

    In the Windows operating system, defragmentation is a maintenance mode task that occurs automatically during optimization. Drives are optimized automatically by default, so optimization must be disabled for volumes involved in deduplication. Settings for scheduled optimization must be changed by the administrator in the Windows Defragment and Optimize Drives utility, so that volumes used in deduplication pools and volumes created from deduplication pools are not selected for optimization. See Microsoft documentation for more information.

Oversubscription and Inflation

A pool is oversubscribed when the total size of all virtual disks created from the pool is greater than the size of the pool. Oversubscription simplifies capacity planning and maximizes capacity utilization. This is an acceptable practice and System Health safeguards such as available space threshold settings are provided for each pool so that administrators may increase the pool size before running out of space.

On the other hand, administrators should be aware of potential issues that could result from drastically oversubscribing the storage source pool. Deduplication optimizes data capacity and therefore administrators may feel confident to oversubscribe by the amount of estimated savings or more. Note that deduplication will subscribe, but not necessarily allocate, 20% more space than the size specified for the deduplication pool.

Certain events can cause a full mirror recovery which will cause previously deduplicated data to "inflate" (or become undeduplicated) to full size.

An auto-generated task is configured to run automatically when the available space on the storage source pool falls below the Attention threshold that is configured in the pool settings. When the task is triggered, the task will run a high priority deduplication on the affected deduplication pools. (See Automated Post-Processing Deduplication Tasks for more information.) High priority deduplication will result in decreased performance while running due to the significant workload on the system. It is also theoretically possible that the high priority deduplication may not keep up with inflation if system resources are insufficient.