Do You Really Need Data Deduplication?

Every IT decision maker either wants deduplication (dedupe) or needs it. At least that is what they are being told by the market and vendors who are trying to sell new functionality. The fact is, while deduplication can save backend storage, it is not a fit for everyone. Let's dive into the subject of dedupe and figure out if it is right for you.

This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.

Every IT decision maker either wants deduplication (dedupe) or needs it. At least that is what they are being told by the market and vendors who are trying to sell new functionality. The fact is, while deduplication can save backend storage, it is not a fit for everyone. Let's dive into the subject of dedupe and figure out if it is right for you.

Data deduplication, also called intelligent compression or single-instance storage, is a method of reducing storage needs by eliminating redundant data. Only one unique instance of the data is actually retained on storage media. Redundant data is replaced with a pointer to the unique data copy.

[ALSO: In 2013, deduplication gets smarter]

For example, a typical network might contain 100 instances of the same background image that is one megabyte. If the file is backed up or archived, all 100 instances are saved, requiring 100MBs of storage space. With data deduplication, only one instance is actually stored. Each subsequent instance is just referenced to the one saved copy. In this example, a 100MB storage demand could be reduced to only 1MB.

In other words, deduplication has the potential to save you money. If deduplication saves you 40% of backend storage, it also reduces your total expenses on storage for data protection by 40%. A 40% savings could be huge when it comes to budgeting and planning.

Types of dedupe

There are three main types of deduplication, and while each has benefits and drawbacks, they also have their place in your environment. The first two types can both be used for a best-of-breed solution:

* Client side dedupe is where your data is deduplicated before it is transferred to your data protection solution, which uses the client to process the meta-data about the files, bytes and bits before it is transferred over the network. The clients take more of the load but it relieves the stress from the network. However, the load it puts on the client could affect the applications that are running on that client. Client side deduplication should only be used when network resources are limited and the client has the memory and processor power to spare to this extra process.

* Server side dedupe is where the clients send all of their data over the network and after all data is transferred, it is processed and the duplicate data is removed. While this method does transfer more data across the network, it relieves the client of any extra workload. If your data protection solution is designed to take this load, then this is the right answer for most instances of deduplication. However, if it was not and you do not have the right amount of disk, memory and CPU power, this will bring your data protection solution to its knees. Not a good idea.

* Inline dedupe is where an additional device is added to the IT infrastructure that provides the deduplication while the data is being transferred to the data protection solution. This relieves the client of the overhead and the server of the depulication processing load. While this seems to be the best of both worlds, it involves a significant investment in a new device that connects to your storage-area network. This solution not only costs money, but for most shops, it is overkill. Usually, the average data protection solution that provides data deduplication functionality can handle up to 5 terabytes (TBs) nightly without the extra inline device.

When you don't need it?

When should you avoid deduplication? Here are the three biggest considerations:

* Cost: Add up the cost of what your system will need in order to enable dedupe. More RAM, CPU power and possibly faster disk could spike the cost and make it not worth the expense.

* Tape: If you are migrating all of your data to tape on a daily basis, then the need for deduplication is moot. Deduplication is only viable on disk thus there is no need to dedupe.

* Data footprint: If you have a small data footprint, less than 10TBs, the need for deduplication can be balanced against the cost. With disk costs decreasing, a single drawer in your storage array may be able to handle the load without deduplication.

Data deduplication is a very cool technology, but is not a fit for everyone. Before you consider upgrading or enabling deduplication, talk to someone who is unbiased and can give you advice on a solution that not only makes technical sense, but also makes financial sense.

Jarrett Potts, director of strategic marketing for STORServer, a provider of data backup solutions for the mid-market. Before joining the STORServer team, Potts spent the past 15 years working in various capacities for IBM, including Tivoli Storage Manager marketing and technical sales. He has been the evangelist for the TSM family of products since 2000. 

Read more about data center in Network World's Data Center section.

This story, "Do You Really Need Data Deduplication?" was originally published by Network World.

To comment on this article and other CIO content, visit us on Facebook, LinkedIn or Twitter.
Download the CIO October 2016 Digital Magazine
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.