Weigh the benefits and drawbacks of a hybrid data warehouse
Among other benefits, a hybrid cloud data warehouse can offer enhanced flexibility and scalability, as well as on-demand access to emerging analytics services.
With hybrid cloud architectures, enterprises get the best of on-premises and cloud environments. IT admins can apply the hybrid model to data warehouses to reap these benefits, but implementation can pose a challenge.
A traditional data warehouse locks enterprises into an on-premises system, according to Andy Walter, CEO of AJW-Advisory, an IT advisory service. These limitations make it difficult, if not impossible, to scale IT infrastructure efficiently. With a hybrid data warehouse, admins can bring some data warehousing capabilities into the cloud while maintaining on-premises capabilities.
However, before an organization adopts a hybrid data warehouse, it should carefully evaluate the benefits and drawbacks of the technology. Then, certain architectural design tips can help with implementation.
What are the benefits of a hybrid data warehouse?
With a hybrid data warehouse, enterprises can benefit from the cloud's flexibility and high performance, and still see the data availability and compliance advantages of on-premises deployments, according to Alex Bekker, head of the data analytics department at ScienceSoft, an international IT consulting and software development company.
More specifically, the key benefits of a hybrid data warehouse include the following:
1. Ease of adoption
A hybrid model eases the adoption of cloud data infrastructure because the organization does not have to migrate all of its data to the cloud at once. Instead, it continues to lean on existing on-premises technology. A hybrid data warehouse can also simplify the integration of data silos created by different departments or applications.
2. Support for partner data
There is currently a trend for businesses to combine internal data sources with external data sources from partners for improved analytics, Walter said. A hybrid data warehouse is well suited to this approach and can reduce the data engineering required to explore new analytics models.
3. Data segregation based on need
Both on-premises and cloud data warehouses have fundamental advantages, said Anay Nawathe, principal consultant for technology research and advisory firm ISG. Traditional data warehouses, for example, can be more cost-efficient, while cloud data warehouses offer enhanced scalability. With a hybrid data warehouse, IT teams can segregate data sets based on use cases to take advantage of each model.
4. Flexibility
A hybrid data warehouse makes it easier to tap into new data services on demand to fit evolving needs, said Ivan Kot, senior manager at Itransition, a Denver-based software development company. Emerging analytics services can easily scale up across both the cloud and on-premises environments.
Drawbacks of a hybrid data warehouse
Compared with an on-premises data warehouse, a hybrid data warehouse can require more effort to implement. Enterprises need a detailed plan to get a hybrid data warehouse project off the ground. This plan should address various challenges, such as the following:
1. Organizational issues
Organizational change is required to implement a new process or practice, and a hybrid data warehouse is no exception. "Internal resistance to change can be fierce, with the knock-on effect of postponing projects and bleeding time and money in a variety of ways," Walter said. Admins need to get buy-in from their teams right at the beginning.
2. Cost considerations
On-premises systems come with guardrails because some effort is required -- both technically and financially -- to grow the infrastructure. As a result, organizations are mindful of any additional resources or costs. In contrast, the cloud provides endless storage and compute capabilities, so resource deployments can easily get out of hand. While cloud providers offer tools to mitigate this problem, enterprises must be as careful with cloud usage and costs as they are with on-premises systems.
3. Unfamiliar tools
"There is a learning curve associated with cloud data warehouses, and adopting a hybrid data warehouse does not exempt your organization from that same learning curve," Nawathe said. Before adoption, carefully evaluate tool options and develop a learning program.
Design a hybrid data warehouse
Ideally, a hybrid data warehouse combines the governance and efficiency of on-premises infrastructure with the scalability of the cloud. The goal is to operate the two infrastructures as a unit rather than as separate pools of data. While it might be easier to set up and operate two independent data warehouses, it costs more over time to manage and use two separate systems.
According to Bekker, it is helpful to design a hybrid data architecture with three underlying goals:
- Reduce data storage costs and manage data growth, with no compromise to data warehouse performance.
- Provide comprehensive data security and compliance with regulatory standards.
- Facilitate uninterrupted data flow between the on-premises and cloud environments for ensured business continuity.
Adopt the right tooling
Teams need to look for the appropriate tools to automate data storage optimization and data processing optimization across on-premises and cloud systems. One approach is to adopt data fabric tools to harmonize data, analytics and workflows. These approaches should span on-premises and cloud-native data warehouses. Data fabric tools include IBM Cloud Pak for Data, K2View Operational Data Fabric and NetApp Cloud Volumes OnTap.
Another approach is to adopt a data warehouse automation tool. These tools automate ETL, or extract, transform and load, operations between raw data sources and one or more data warehouses. Data warehouse automation tools include Azure Data Factory, Informatica Data Validation, Oracle Autonomous Data Warehouse and Qlik Compose for Data Warehouses.
A third option is to adopt a data warehouse tool with both on-premises and cloud data warehouse capabilities that can work together. Examples include Cloudera Data Platform, IBM Db2, Oracle Autonomous Data Warehouse and Teradata Vantage.