How to build a successful cloud data architecture
As enterprises vacate the premises and migrate their operations skyward, a cloud data architecture can provide the long-term flexibility to improve workflows, costs and security.
Enterprises implementing a cloud data architecture can accelerate data insights and lower their IT costs, but the cloud's potential benefits can create new problems if the data architecture isn't well designed and managed properly.
The cloud, for example, provides the flexibility to spin up new data platforms and quickly scale systems to handle large processing jobs. Yet it can be costly if users are empowered to create inefficient data processes that consume more resources than necessary. Similarly, data security can be improved when data is processed and managed using cloud services and following recommended practices. But cloud data architectures can introduce security vulnerabilities that may be missed by an IT staff that's more familiar with running on-premises workloads.
Despite these issues, tangible benefits can be derived even if experimenting with cloud-based data architectures. Experimentation may allow data management and analytics teams to explore bursty use cases that aren't practical with existing systems. And if things don't work out, it's easy to terminate an unsuccessful experiment and move on to the next project in the cloud.
Why build a cloud-based data architecture?
The cloud can improve an enterprise's data services in terms of cost, security, tooling and data localization, said John Carey, managing director at global management consultancy AArete. Cloud services can be time-managed, which can benefit applications requiring sporadic services. Numerous security features are baked into cloud services that can help tighten data security, and cloud providers offer tools to help manage these services.
The cloud can accelerate data workflows, said Chris Bergh, CEO, founder and head chef at DataOps platform DataKitchen, adding that infrastructure cost savings are often only a tiny part of a well-managed data migration project. "A more exciting and impactful goal for cloud migration," he conjectured, "is to improve business agility."
A cloud data architecture can play a key role in the implementation of streaming data services. "Streaming technology is best implemented on the cloud since it requires fast and scalable architecture to meet the shifting needs of data streams," noted Suyash Karanwal, data analytics manager at IT consultancy Saggezza. As companies increasingly adopt streaming technologies, an elastic data architecture can dynamically and automatically adapt to business needs.
Tips on building a data architecture in the cloud
Data architects need to implement several best practices when planning, building and maximizing the benefits of a cloud data architecture.
- Start with a business case. Organizations can often rush into cloud agreements without considering the best data architecture to address a business problem, according to Craig Wright, senior partner of business advisory and transformation, at technology consulting firm West Monroe. Start with the business use case, choose the cloud components that align with it and define each component's role in creating business value. This practice makes it easier to evaluate the cost and benefits of cloud migration.
- Experiment and test. There are hundreds, if not thousands, of technologies and data architecture patterns available. "Do not hesitate to try numerous options," advised Cooper Lutz, cloud and DevOps lead specialist at digital transformation consultancy Ahead. It can be relatively inexpensive to test out a new idea and shut it down if things don't work out. Once you've found an approach that works, run with it and allow it to evolve over time.
- Tame unstructured data. Moving to cloud data architectures opens opportunities to explore unstructured data use cases. "Cloud-based data architectures are ideal for working with unstructured data like social media feeds or semistructured data like XML documents and messages," said Mike Rulf, CTO for the Americas region at managed cloud provider Syntax. Cloud-based tools can unlock interesting relationships and help eliminate much of the work associated with defining data cubes and other structures that's required by traditional data warehouse platforms for data analysis.
- Focus on data workflows. The best strategy for increasing data monetization, DataKitchen's Bergh argued, is to focus on how a cloud data architecture can improve data workflows instead of data management tools and technologies. Adopt metrics that help assess how cloud services can minimize the cycle time of data analytics applications, he recommended.
- Balance data requirements with cost. In the cloud, costs can easily scale out of control. "Keeping all data at the highest performance tier is costly, so you need to determine what data is needed by what process and when to get the highest return on cloud resources," said Terri Sage, CTO at self-service data and analytics platform maker 1010data. A best practice is to keep a lid on costs by tracking and enforcing usage limits. "It is so easy to dump data into infinitely sized storage and forget it," Sage reasoned, "but that creates an infinitely large recurring storage cost."
Cloud data architecture challenges and issues
Data architects need to navigate several challenges when deploying a cloud data architecture, including technical issues like data gravity, political issues such as existing investments, and process issues like security and incomplete data migration.
- Data gravity. "Data gravity," Lutz said, "is by far the most common challenge organizations face as they continue to solve for hybrid or multi-cloud environments." Data gravity issues can arise when applications span multiple environments and when ingesting, transforming and analyzing massive data sets from disparate sources. These challenges can be overcome by separating data storage, utilizing event-driven architectures, analyzing data at the edge and scaling public cloud compute for batch processes and large-scale analysis.
- On-premises investments. Data architects may need to contend with pushback from management and its desire to maximize corporate investments in on-premises processes. The infrastructure necessary to support a sizable on-premises data architecture requires significant capital investment, Rulf said. One way to mitigate this hurdle is to identify bursty use cases that are impractical for the existing on-premises architecture.
- Data security and privacy. "Data security is one of the foremost challenges," Sage warned, "and this can be addressed by controlling access to the data whether in transit or at rest, ingesting data from a reliable external source and validating data at the point of entry." A cloud data architecture can take advantage of hardened services to improve security, but that requires upfront work to automate data handoff processes. Consider data governance to address privacy and security issues by classifying data, creating controls and policies, and managing data lifecycles through catalogs and tracking.
- Regulatory and compliance obligations. To comply with data quality and protection policies and regulations, on-premises hosting of data may sometimes be favored over the cloud. To counter this resistance, Wright recommended, ensure that stakeholders are made aware of successful use case studies and liability protections as a result of cloud data architectures.
- Incomplete cloud migration. Companies can get cold feet in the middle of their migration to the cloud, Saggezza's Karanwal said, and accidentally use a combination of cloud and on-premises systems. Cloud computing works best when a company totally migrates its data and applications to the cloud, he contended.
Deploying and managing a hybrid cloud architecture
If total commitment to a cloud data migration is impractical, then several factors and actions need to be considered when planning a hybrid cloud data architecture:
- Data governance. Governance and security are even bigger issues when data spans an on-premises and cloud infrastructure that creates a larger attack surface for hackers and increases the risk of violating privacy mandates. Ensure that all data sources, services and storage are secured with encrypted tunnels and protocols, Rulf advised. Sanitize data so it meets privacy controls, and establish safeguards to protect data from malware and coding that invites a data breach.
- Increased complexity and management risks. Model important relationships and interoperability constraints, Wright said. Document the data dimensions relative to volumes, the complexity of data structures, workload and demand profiles, query volumes, complexity and response times, and any data latency considerations. Data curation helps ensure and preserve data usage on the optimal platforms, maintains integrity through the ingestion and transformation processes, and keeps the necessary focus on data quality and accuracy.
- Cost management. Every new element added to a cloud or hybrid architecture complicates cost management. Keep a watchful eye on egress charges applied when moving data out of a cloud platform, Karanwal cautioned. While many cloud services allow data to be moved into the cloud, it can prove costly to transfer the data back.
- Unified infrastructure. Synchronized infrastructure can ensure real-time data replication from the data center to the cloud and back when done in a way that mitigates egress charges. A unified user authentication scheme can unite access across cloud and data center resources. A data fabric can provide the appropriate architecture and data services to orchestrate applications that span cloud and data center systems.