ra2 studio - Fotolia
Five data migration approaches for cloud archiving
Administrators can take several approaches to migrating data to a cloud archive, from manual migration to using cloud-integrated storage.
Moving a significant amount of data to a cloud archive service provider can be difficult. How tough the process is depends on several factors, including the amount of data IT administrators must move, which data migration approaches they use and how much time they have to move the data.
The first approach to data migration admins might consider is to move the data manually, but it's a time-consuming process. Consider what it would take to migrate hundreds or thousands of terabytes of data. That much data does not move well across most wide area networks (WANs). Even when the bandwidth is 1 Gbps pipe, the best-case scenario is that it will take a little more than two hours to transmit 1 TB of data. To move 1 PB would take nearly three months. Unfortunately, networks rarely deliver the best-case scenario; admins should expect approximately 30% to 40% of the best case. With that in mind, it would take approximately five hours to transmit 1 TB of data, or more than seven months for 1 PB.
A data migration effort of that size requires planning, software to move the data, experienced personnel, execution, cleanup, error corrections, permission and access transfers, removal of the data from the original storage and validation. The entire process can take significantly more time if the software requires a lot of manual work from IT. That is why there are other data migration approaches to move a large amount of data to a cloud archive.
Best data migration approaches for cloud archive services
The second data migration approach admins can take is seeding, which is the process of moving data from where it currently resides to a local storage system or series of nodes from the cloud archive service provider. Depending on the speed of the connection, actual data movement can take far less time, though it will still require some type of data migration software.
The key to rapid data migration is to use highly automated software. Once data has been moved to the seed storage system, admins pack it up and ship it back to the cloud archive service provider. The service provider then takes that system and migrates the data to its cloud archive, or connects that storage system or series of nodes to the archive. Archive service providers prefer this data migration approach.
A third option is the cloud storage gateway, also known as cloud-integrated storage (CIS). This method is a kind of hybrid of the first two data migration approaches. Admins must migrate the data from where it currently resides to the CIS, so the same issues of lengthy data migrations apply. The CIS then moves the data to IT's cloud archive of choice in the background. It deduplicates and compresses the data before transferring it to the cloud archive, which reduces the cost, as well as the amount of cloud archive storage the data consumes. The process also keeps a stub of the data in the local CIS that makes the data appear to be stored locally. The CIS acts like a cache to the cloud archive.
Using a CIS still takes quite a long time because it has the same WAN bandwidth limitations as the first method. But it takes less time than migrating the data manually because there is less data to transmit after dedupe and compression. The downside to this method is that users can only access the data through the CIS. IT must have CIS systems for different locations or cloud applications that may need access to that archival data.
A fourth option that is becoming more common is the mid-tier or enterprise-class storage system with a cloud interface on the back end. With this method of moving data to a cloud archive service provider, the cloud becomes a storage system tier. In many ways this is quite similar to CIS, but it has significantly better primary (not archive) data performance, capacity and storage functionality. Data that admins move to the cloud is deduplicated and compressed, but it appears local via a stub, just like cloud-integrated storage. But tiered storage systems are generally positioned as local primary storage that happen to move aged, snapshot and low-access data to a cloud archive while maintaining local access.
The last of the five data migration approaches is server-to-server replication. Administrators install a piece of software on each machine -- physical or virtual -- at the source and on the destination side. Data is replicated from the source machine to its destination in the cloud. The destination machine then writes the data to IT's chosen archive. Once the data is written and validated, it is manually removed from the source machine. The advantage of this method is that it is fairly simple. The disadvantage is that it requires installing software on every single physical or virtual server. That's not a big deal if there are not a lot of machines to worry about, but it can be quite onerous when there are hundreds or thousands of them.