McCarony - Fotolia
What are some new data deduplication techniques?
Copy data management is just one technology utilizing recent innovations in the backup deduplication space, combatting sprawl and managing snapshots.
Data deduplication is a storage capacity optimization technique that identifies repeated sets of data in a data stream and eliminates them, retaining a single copy on physical media. Metadata and pointers are used to track each logical data instance that maps to the physical copy. Data deduplication techniques were established in the backup space so multiple, repeated full backups of a server or virtual machine could be heavily deduplicated because they contained either the same unchanged data or were based on a single master image.
Data deduplication techniques have figured heavily in products that resolve sprawl issues, such as copy data management (CDM) platforms. These products offer the ability to use data for purposes other than data protection, such as creating test/development copies of production data. Customers save with the re-use of static data, now typically stored on spinning media in powered up deduplication appliances. CDM vendors have built in technology that enables backups to be delivered effectively and that works for secondary requirements, including managing thousands of application snapshots.
Another innovation in data deduplication techniques is "pre-duplication," where the client is able to deduplicate data before sending it across the network. While this concept was seen almost 10 years ago at PureDisk, a company acquired by Symantec, vendors today are integrating it into their platforms. Hewlett Packard Enterprise did it with its 3PAR array, looking to improve data deduplication techniques using snapshots and to reduce the amount of data transiting the network. The deduplication process is also being distributed, making it possible to scale out backups without the bottleneck of managing the hash values in a single process.
Backup vendors are now also looking at the data itself and building in intelligent deduplication based on the application content rather than basing it on simple block-level identification. File-level dedupe can identify files that can be single instanced, including file attachments on backups of email systems. Again, these processes are making the backup client more intelligent, reducing the workload on the network and the back-end deduplication engine.