Rawpixel - Fotolia
Where integrated hybrid cloud storage makes sense in the enterprise
Is your organization ready to do more with hybrid cloud storage? Find out how to integrate this technology and put all the hardware, software and services to work effectively.
Most enterprises have some form of hybrid cloud strategy. But many of these amount to tiptoeing into a deep pool via straightforward projects that are easy to implement and can show a fast ROI.
Applications such as backup and long-term archival where aging tape libraries and off-site rental vaults are replaced with cloud cold storage services like AWS Glacier or Google Cloud Coldline can provide quick wins. They often displace existing storage systems in need of a technology upgrade with an easily understood service that provides a distributed, high-availability infrastructure; usage-based pricing; and built-in security inherent to all cloud services.
The next stage in the evolution of enterprise hybrid cloud storage entails linking cloud services to existing applications to provide an extension of on-premises infrastructure. This integrated hybrid cloud storage approach requires a seamless interface between the private systems and public services with data continuously synchronized between the two. The goal is to make the cloud an extension of enterprise capacity and a staging area for applications to use more advanced cloud databases, data warehouses, analytics and machine learning services.
What follows is an overview of the various use cases, techniques, hardware, software and services for integrated hybrid cloud storage.
Application usage scenarios
Readers unfamiliar with the basics of hybrid and multi-cloud infrastructure should refer to previous TechTarget articles. Start with this primer in which Gaurav Yadav, founding engineer of Hedvig, a distributed storage platform provider, defines hybrid cloud as storage spread across at least one on-premises data center, as well as at least one public cloud. He describes multi-cloud as storage spread across more than one public cloud, so enterprises can pick any public cloud for storage requirements and move data across these clouds when needed.
Among the advantages of hybrid cloud storage are it enables organizations to access sophisticated data services like Hadoop clusters with Spark analytics that might only be needed a few times a year. Other integrated hybrid storage use cases include:
- Augmenting on-premises capacity with cloud object and file services for nearline, infrequently accessed data while providing local copies for hot data.
- Conversely, creating a local copy or cache of cloud-based data for low-latency on-premises access.
- Feeding data from internal systems to a cloud database or more advanced data analysis system with an extract, transform and load pipeline, data warehouse and analytics engine, such as the Azure modern data warehouse example detailed here.
- Using cloud storage to synchronize and offload data from multiple branch office locations, such as this scenario using Azure StorSimple.
- Feeding on-premises data to cloud-native applications and systems, such as web or e-commerce sites, content delivery networks, records management system and developer test/dev environments.
Of course, any infrastructure used for cloud-based applications or active file systems is equally capable of feeding backup and archive systems, so the following infrastructure options are a natural evolution of these baseline uses of cloud storage.
Integrated hybrid cloud storage infrastructure options
There are several ways to integrate on-premises and cloud storage, and they differ in their complexity, technical maturity and features. The simplest turns cloud storage into an auxiliary layer of an organization's storage hierarchy, while the most advanced and complicated effectively creates a distributed storage platform that spans multiple environments.
The following are four popular approaches to integrated hybrid cloud storage.
On-premises storage systems with built-in cloud integration. Many enterprise storage arrays, such as Dell EMC Isilon, NetApp systems running Ontap, Cohesity and others, usually have optional features that enable them to automatically replicate data to cloud services. Although these are typically used for one-way backup and archiving, some can support bidirectional synchronization that would, for example, allow cloud data modified by an application to synchronize back to on-premises systems.
Cloud caching appliances with local file systems. These purpose-built hardware or software appliances are designed to locally mirror a subset of cloud-based data to improve application performance and usability by reducing latency and increasing throughput. Many of these, such as Microsoft Avere products, include more advanced features like a globally distributed network filesystem with a unified namespace to create a single organizational filesystem that can span multiple branch offices, data centers and cloud services.
Cloud storage gateways. These use network storage protocols such as NFS and SMB for NAS and iSCSI for SAN and block volumes to connect on-premises systems and cloud services. They can be implemented as virtual software, running on a host VM, or a hardware appliance that serves as a proxy between a data center LAN and a virtual private cloud. Gateways typically include data compression and other network optimization techniques found in WAN optimization appliances to improve performance and reduce the amount of data transferred. For example, the AWS Storage Gateway has modes for files, volumes and tapes and can connect to S3 for object storage, S3 Glacier/Glacier Deep Archive, Elastic Block Store for block storage and AWS Backup for tape.
Software-defined storage (SDS) systems. SDS systems create a software overlay that decouples the logical storage configurations from the physical instantiation. By creating a software abstraction layer, SDS enables filesystems to transparently span on-premises and cloud infrastructure, including multiple locations like AWS Availability Zones. SDS also provides a centralized management control plane that includes a set of enterprise storage services, such as deduplication, compression and snapshots, and can automatically migrate, replicate and synchronize block volumes across on-premises and cloud environments. SDS products are available from both large integrated IT providers like NetApp and VMware and smaller companies specializing in SDS, such as Hedvig, Qumulo and Scality.
Of these cloud storage integration techniques, cloud gateways and cloud-aware storage systems are the easiest and most mature to implement, while SDS remains a rapidly changing technology with products that require significant planning and expense for both implementation and operations. (They're typically used to run the VMs required for the management and data control plane.)
Use and implementation guidance
Most organizations are just starting with a true hybrid storage architecture. According to this Gartner report, real-time, bidirectional data synchronization -- much less seamless, SDS-enabled hybrid filesystems -- aren't widely deployed yet.
A logical first step for organizations already using the cloud for backup is the addition of a storage gateway and, when available, exploiting the capabilities in storage arrays offering cloud support. These will more tightly integrate on-premises filesystems with cloud infrastructure and enable individuals and applications to use familiar network protocols to access cloud storage services.
A gateway is sufficient for many hybrid use cases, such as supplying data to a cloud-based data warehouse or machine learning model and aggregating remote office filesystems for things like user directories and remote applications to a central cloud repository.
Organizations pursuing an integrated hybrid cloud storage environment should start by assessing their business and application needs along with the limitations or deficiencies of existing storage systems to prioritize features and guide the design. When evaluating products, seek those that support standard protocols and multiple cloud vendors -- at least the big-three: AWS, Azure and Google Cloud -- to maximize their IaaS options and avoid lock-in.
Organizations that have already committed to a particular vendor for cloud backup or other services should start with a vendor-supplied product, such as AWS Storage Gateway or Azure StorSimple, because these are typically the cheapest and easiest option for hybrid storage integration.