Data storage terminology and concepts – Dell PowerVault DR2000v User Manual

Page 15

Advertising

Table 3. External Drive Capacity and Available Physical Capacity

DR Series System

Drive Capacity

Available Physical

Capacity (Decimal)

Available Physical

Capacity (Binary)

Total Logical

Capacity @ 15:1

Savings Ratios

(Decimal)

Total Logical

Capacity @ 15:1

Savings Ratios

(Binary)

1 TB

9 TB

8.18 TiB

135 TB

122.7 TiB

2 TB

18 TB

16.37 TiB

270 TB

245.55 TiB

3 TB (DR4100 and

DR6000 only)

27 TB

24.56 TiB

405 TB

368.4 TiB

4 TB (DR6000 only)

35.37 TB

32.17 TiB

842.5 TB

540 TiB

Data Storage Terminology and Concepts

This topic presents several key data storage terms and concepts that help you to better understand the role that the DR

Series system plays in meeting your data storage needs.
Data Deduplication and Compression: The DR Series system design draws upon a wide variety of data-reduction
technologies that include the use of advanced deduplication algorithms, in addition to the use of generic and custom
compression solutions that are effective across a large number of differing file types. The system uses a concept of
content-awareness where it analyzes data to better learn and understand the structure of your files and data types.
Once this is learned, it uses this method to improve your data reduction ratios while reducing resource consumption on
the host. The system uses block deduplication to address the increasing data growth, and this is well suited to providing
the best results for routine and repeated data backups of structured data. Block-level deduplication works efficiently
where there are multiple duplicate versions of the same file. This is because it looks at the actual sequence of the data–
the 0s and 1s–that comprise the data.
Whenever a document is repeatedly backed up, the 0s and 1s stay the same because the file is simply being duplicated.
The similarities between two files can be easily identified using block deduplication because the sequence of their 0s
and 1s remain exactly the same. In contrast to this, there are differences in online data. Online data has few exact
duplicates. Instead, online data files include files that may contain a lot of similarities between each file. For example, a
majority of files that contribute to increased data storage requirements come pre-compressed by their native
applications, such as:

•

Images and video (such as the JPEG, MPEG, TIFF, GIF, PNG formats)

•

Compound documents (such as .zip files, email, HTML, web pages, and PDFs)

•

Microsoft Office application documents (including PowerPoint, MS Word, Excel, and SharePoint)

NOTE: The DR Series system experiences a reduced savings rate when the data it ingests is already

compression-enabled by the native data source. It is highly recommended that you disable data compression

used by the data source. For optimal savings, the native data sources need to send data to the DR Series

system in a raw state for ingestion.

Block deduplication is not as effective on existing compressed files due to the nature of file compression because its 0s
and 1s change from the original format. Data deduplication is a specialized form of data compression that eliminates a
lot of redundant data. The compression technique improves storage utilization, and it can be used in network data
transfers to reduce the number of bytes that must be sent across a link. Using deduplication, unique chunks of data, or
byte patterns, can be identified and stored during analysis. As the analysis continues, other chunks are compared to the
stored copy and when a match occurs, the redundant chunk is replaced with a small reference that points to its stored
chunk. This reduces the amount of data that must be stored or transferred, which contributes to network savings.
Network savings are achieved by the process of replicating data that has already undergone deduplication.
By contrast, standard file compression tools identify short repeated substrings inside individual files, with the intent of
storage-based data deduplication being to inspect large volumes of data and identify large amounts of data such as

Advertising