Getty Images/iStockphoto
An analyst's blueprint for choosing a cloud data platform
Though choosing the tools that best fit the needs of an organization is difficult, analyst Doug Henschen has come up with a series of steps that can serve as a guide.
Choosing a cloud data platform can be overwhelming with the breadth of products on the market, but following a basic strategy can not only simplify the selection process but also enable organizations to construct the right platform for their needs.
The process of deriving value from data is complex. So are many of the tools designed to help organizations realize that value. Selecting the tools that best fit an organization's needs, therefore, is crucial.
But where to begin?
"It's a daunting task," Doug Henschen, an analyst at Constellation Research, said during a March 8 webinar hosted by data lake vendor ChaosSearch.
To make it less overwhelming, Henschen outlined essential steps organizations can take when choosing a cloud data platform or the capabilities to construct their own.
They begin with a self-evaluation of the organization, progress to a consideration of an overall strategy and which capabilities on the market fit that strategy, and end with a screening and testing process to ultimately choose the right set of capabilities.
Cloud data platforms
Cloud data platforms are platforms where customers can not only store their data in lakes, warehouses and lakehouses but also connect analytics and data science platforms to perform analysis and data science tasks like developing augmented intelligence features and machine learning models.
Amazon Web Services, Google, Microsoft and Oracle are tech giants that offer cloud data platforms, while Databricks and Snowflake are growing vendors whose platforms enable data storage, analytics and data science.
Since the onset of big data nearly two decades ago, all data management and BI platforms have aimed to simplify data exploration and analytics.
Until recently, however, they relied on connecting one tool for one task to another for another task, then still a third tool for yet another task, and so on. Data needed to be extracted, loaded and transformed over and over again to get it from one tool to another, and harnessing data for analysis was complex.
Doug HenschenAnalyst, Constellation Research
The most modern cloud data platforms address that complexity. They converge capabilities in a single environment -- enabling data management, analytics and data science tools to work within data lakes and warehouses and other data repositories -- to reduce friction.
In addition, they enable automation of repetitive tasks and employ AI capabilities to further foster ease of use.
"Market-leading companies and fast followers are embracing things like data lakes, log analytics, data fabrics, lakehouse architectures, neural networks and so on," Henschen said.
But choosing the cloud data platform that best suits the needs of a particular company and effectively enables it to derive value from its data is not so simple, he added.
According to Henschen, one CEO of a major data platform vendor told a group of analysts that as much as vendors claim to offer a full set of easy-to-use capabilities, customers still struggle to put their analytics tools to good use.
Likewise, Thomas Hazel, founder and CTO of ChaosSearch, noted that customers have often failed to derive value from their data platforms.
"There's been a lot of promises," he said. "The promise was to draw those insights. However, scaling business has been a real challenge. … To scale your business, you need to scale your operations. Bringing things together is a fundamental shift."
Selection process
Choosing or piecing together a cloud data platform that enables an organization to derive value from its data begins with self-evaluation, according to Henschen.
Organizations need to know who they are before they can discover what they need.
"This is where your company is coming from," Henschen said. "It's important to have a real understanding of this starting point before even contemplating a technology selection."
The self-evaluation starts with knowing the current budget for adopting or building a cloud data platform and what the budget will be going forward.
Next comes buy-in at the executive level, developing a vision for becoming more data-driven, comparing that vision against the reality of the skills of the organization's employees -- both within the data team and business users who might engage with data -- and finally an examination of the organization's ties to its current technology and how to move on from it.
Once that self-evaluation is complete, organizations can begin the selection process. But rather than go right to the capabilities themselves, they first need to look at their strategic considerations.
"Tech strategy considerations are about where you're going, irrespective of the platform choice," Henschen said.
The first technology consideration is cloud strategy, including which clouds the organization plans to use, how extensively it plans to use each cloud, and whether it will also deploy on premises.
Next comes the data storage strategy, whether there will be just one repository such as a warehouse or lake, or multiple ones, and which workloads will be run on which repositories. That leads into a consideration of the organization's BI/analytics strategy and a look at whether its existing platform is meeting the current needs or needs to be replaced. Concurrently comes a look at the organization's monitoring and management tools and whether they meet current needs.
The final technology consideration is the data science platform and how it might fit with various clouds, data lakes and analytics tools.
"All of these feed on each other into the ramifications for the analytical data platform selection," Henschen said. "The most important one is to focus on that cloud strategy -- what has been the progress into the cloud, what is the commitment to go deeper into the cloud, which clouds are being chosen and which ones are standards?"
Finally comes a look at various product attributes and testing.
In addition to examining the product attributes of monitoring and management tools, BI/analytics tools and data science tools, organizations should look at the deployment management, administrative and maintenance attributes of different platforms.
"You want convergence for the sake of cost, for the sake of simplicity," Henschen said. "But balancing that, you have to get the functionality and performance you need. My favorite analogy is of a bullet train -- fast and performant but also able to carry a lot of passengers."
Implementation
An organization's process for selecting or building a cloud data platform essentially comes down to knowing where it's coming from, understanding the attributes of various products and ultimately choosing from a short list of tools, according to Henschen.
Organizations can even seek out reference customers who may not be direct competitors but whose data and analytics needs are similar and learn from their choices and deployments.
"It's all about driving the business and not building another technology and hoping they will come," Henschen said. "Think big and long-term, not just at the big project in front of you but anticipate where else the organization might go. You don't want to choose a platform that might prove to be inadequate a few years down the road."