What to consider when storing data in the cloud
Storage administrators who want to keep data in the cloud need to consider collaboration regulations, cost factors and performance requirements.
What you will learn: When storing data in the cloud, storage administrators must decide if it will enable greater organizational productivity, meet collaboration regulations, reduce costs and continue to satisfy performance requirements.
How do you know which data to store in the cloud? This is a simple question with a not-so-simple answer, as storage professionals must consider numerous factors such as application types, response time constraints, security standards, as well as file sharing and geographic needs, to name just a few.
Let's consider the application that creates and accesses data. When the application is already in a public cloud, it should be safe to assume that the cloud service provider has been vetted. Most cloud service providers incorporate local DAS, SAN, network-attached storage (NAS) and/or object storage with their servers. This type of arrangement isn't much different from running applications in a private virtualized data center, although it does mean the service provider can deliver additional compute and storage resources on demand.
Note: If you're running a mission-critical application (such as a database or email) in the cloud, the data should stay with the app. If you're running the application locally, the data should once again be with the app. Primary applications require fast response times, so separating the data from the application makes response times unacceptable.
The issue becomes far more complicated when the application and the data aren't colocated. Application response times will rise significantly, depending on the actual circuit distance (translated as latency or delay) between the application and its data source. Distance latency won't be solved anytime in the near future. Therefore, it makes little sense for a heavy transactional application, such as an e-commerce or structured application, to have the application separated from the data.
One way to handle the distance issue is to use cloud-integrated storage (CIS), which puts a SAN or NAS storage system on the floor with the application. CIS looks, feels and acts just like any other primary storage system. Some products even have multiple tiers and/or cache leveraging flash solid-state drives. Cloud-integrated storage is available from Microsoft (StorSimple), Nasuni, Panzura, Riverbed and TwinStrata.
The main difference between CIS and traditional SAN/NAS storage is that it connects on the back end to a wide variety of cloud storage vendors such as Amazon Simple Storage Service, Google Storage, HP Cloud, IBM SmartCloud, Microsoft Azure, Nirvanix and Rackspace. It uses cloud storage as an unlimited storage tier where less-frequently used data and older snapshots are placed. Before data is placed in this tier, it's commonly compressed and deduplicated to keep monthly cloud storage costs low. CIS essentially provides local high-speed performance for structured applications while transparently keeping the majority of cold or passive data in cloud storage. One aspect of CIS that users seem to love is its collaboration feature. Users in different parts of the world can access the same cloud content through the use of cloud-integrated storage. The cloud becomes the central repository, with data typically encrypted in flight and at rest by the CIS systems; each CIS system with the proper credentials can access and work on the same content. For example, CIS has become a wonderful tool for the media/entertainment industries and applications related to tasks such as post-production video editing. Companies that manage big data, such as the life sciences, use CIS for applications related to genome decoding. But the technology doesn't have to be limited to complex applications; in the insurance industry, for example, claims management processes are a good fit due to the collaboration aspect.
Archives, access and the Hadoop factor
Content such as backups and archives typically aren't accessed as frequently. But when backup data has to be recovered, performance becomes an urgent issue. Cloud backup software and cloud backup service providers manage this problem by mounting physical and virtual server images directly in the cloud or locally on-site as a virtual machine.
They can also address backup requirements by backing up directly to the cloud and keeping the latest backups locally on-site. Several backup software providers (Asigra, CommVault, EMC Avamar, EMC NetWorker and Symantec NetBackup, for example) provide these capabilities for both private use and to cloud service providers.
Archiving is a different story. When it comes to archive data, the question is whether the archive should be in a public cloud.
The answer depends on whether the archive requires cold access or ongoing access. If the archive is for retention compliance only (regulatory, corporate governance and e-discovery), separating the content from the application is fine. If the archive will be searched and analyzed relatively frequently, then it should be colocated with the application that does the data analysis. A good example is Hadoop. If the content wasn't located with Hadoop, jobs would be cumbersome and run too slowly. Service providers have come to realize this, and some are starting to offer Hadoop services in the cloud so that archived data can be analyzed where it rests in their cloud. The results are then distributed to wherever the user resides. Once again, it comes down to the application being utilized with the content and the performance requirements.
Perhaps the application most closely associated with cloud storage is file sync and share (FSS). Most laypeople call this cloud storage, but it's really not. FSS is an application that runs on a physical or virtual server in the cloud. It can also run locally in your data center. That makes file sync and share, in the strictest sense, a cloud application and not cloud storage. But, as IT administrators know, users don't care how you define it, they just love its functionality. File sync and share is a convenient application for users and a great example of how IT organizations can leverage the cloud. FSS allows users to share files between devices such as desktops, laptops, tablets and smartphones, as well as among co-workers, partners and even competitors.
EMC, Hitachi Data Systems, NetApp and others now offer file sync and share that works with private clouds or standard data center storage. But the technology can and often does violate organizational security and compliance rules. Let's use the example of a health care provider. Under the Health Insurance Portability and Accountability Act (HIPAA) and Health Information Technology for Economic and Clinical Health rules, all patient information must be digitally and physically secured with an exact copy in an easily accessible location. Basic encryption isn't enough; it has to be certified by the National Institute of Standards and Technology, which today means Federal Information Processing Standard Publication 140-2 certification. Users using Box, Dropbox, Hightail and so on could therefore be violating HIPAA rules, with each violation garnering a fine ranging from $100 to $1.5 million. Other vertical industries have equivalent compliance issues.
While not every organization is subject to the same strident compliance rules, most can count on employees bringing their own devices to work. The bring-your-own-device movement means IT managers must choose a file sync-and-share service or product that best matches their organization's IT security rules while maintaining user convenience. Among the features to look for are single logon for file sync and share and all other IT services; remote wipe; ownership management; scalability in both users and capacity; and encryption. But remember, FSS technology isn't a substitute for a backup product or service. It's also not designed to work as a backup even though it has a copy of the user's manually shared files. What file sync and share is ideal for is individual productivity between devices, and collaboration or workflow sharing.
About the author:
Marc Staimer is the founder and senior analyst at Dragon Slayer Consulting in Beaverton, Ore. The consulting practice of 15 years focuses on the areas of strategic planning, product development and market development. Marc can be reached at [email protected].