ra2 studio - Fotolia

Cloudera Data Platform gives big data users multi-cloud path

Cloudera released a big data platform combining its technologies and ones from Hortonworks, initially in the AWS cloud but with multi-cloud support to come.

Hadoop and big data pioneer Cloudera Inc. formally launched the first iteration of its new Cloudera Data Platform, hoping the technology's promised support for multi-cloud environments will give it renewed appeal with users that increasingly have turned to AWS and other cloud market leaders for big data systems.

The multi-cloud future isn't entirely here yet for Cloudera, though. A set of three cloud-native services built on Cloudera Data Platform (CDP) for data warehousing, machine learning and analytics use cases is initially available on the AWS platform only. The cloud services are due to be released on Microsoft Azure later this year and Google Cloud Platform in 2020, Cloudera said.

Cloudera is the only independent vendor left in the big data platform market after its merger with former rival Hortonworks last January and HPE's acquisition of MapR Technologies in August. But its name notwithstanding, Cloudera still gets about 90% of its revenues from on-premises users, and it has been hit hard by the accelerating shift of big data deployments to the cloud, culminating in the departures of CEO Tom Reilly and co-founder and chief strategist Mike Olson in the wake of weak fiscal first-quarter results following the absorption of Hortonworks.

Cloudera Data Platform should give the company a better leg to stand on in competing against AWS, Microsoft and Google in the cloud, according to technology analysts who attended a two-day series of briefings by Cloudera executives in New York this week.

"They're moving in the right directions," said Doug Henschen, an analyst at Constellation Research. In addition to the new cloud services, he pointed to CDP's support for cloud object storage, common UX and administrative functions across on-premises and cloud implementations, and the ability to replicate data, data governance policies and data lineage records between systems wherever they're running.

"Cloudera is offering flexibility and choice not available from any single-cloud services offering," Henschen said.

CDP puts cloud storage ahead of Hadoop

The company has also reinvented how it lets users process and store data, said William McKnight, president of McKnight Consulting Group. Customers can still use the Hadoop Distributed File System (HDFS) as part of CDP, but Cloudera expects the majority of them to choose native cloud storage options instead of HDFS, resulting in the separation of compute and storage resources in big data systems.

Cloudera has put the HDFS-only approach to data management in the rearview mirror.
William McKnightPresident and CEO, McKnight Consulting Group

With that change in focus, "Cloudera has put the HDFS-only approach to data management in the rearview mirror," McKnight said. In addition, the promised cloud portability will make Cloudera Data Platform a better alternative for users looking to run machine learning applications and other types of advanced analytics at an enterprise scale, he said.

The CDP launch was timed to coincide with the start of the 2019 Strata Data Conference in New York, an event jointly hosted by O'Reilly Media and Cloudera. The new platform is fully based on open source technologies and combines elements of the separate Cloudera and Hortonworks product offerings. Cloudera executives detailed some of the planned CDP capabilities after the merger was completed and again when the company reported its most recent financial results earlier this month.

Another key component is support for the Kubernetes container orchestration technology. Cloudera's commitment to Kubernetes and hybrid installations of cloud and on-premises systems will enable it to support a variety of cloud infrastructure options, said Lynne Baer, an analyst at technology research and consulting firm Amalgam Insights. CDP also gives Cloudera users a realistic path to grow their Hadoop-based data repositories and analytics systems, she said.

Cloudera aims to keep users in place

Based on what Baer heard at the analyst briefings, the top priority for Cloudera Data Platform is to stop existing users from defecting to the big cloud vendors. Noting that Cloudera claims to have 900-plus customers generating more than $100,000 in recurring revenue annually, she said the company's "real challenge is to increase wallet share over time rather than gain new clients."

Cloudera Data Platform demo at the Strata Data Conference in New York
A Cloudera employee demos Cloudera Data Platform at the Strata Data Conference in New York.

Other analysts had a similar takeaway. "It appears that Cloudera is focused on keeping its existing clients happy so they will remain on the platform in a very competitive hybrid cloud data management market," tweeted Judith Hurwitz, president and CEO of Hurwitz & Associates.

In another tweet, Tony Baer, principal at dbInsight (and no relation to Lynne Baer), wrote that the most logical users of Cloudera Data Platform will be "customers with sophisticated needs & resources" -- a situation that he likened to the one faced by data warehouse pioneer Teradata.

The CDP-based Cloudera Data Warehouse, Cloudera Machine Learning and Cloudera Data Hub services are priced per hour of usage in the AWS cloud; the hourly rate varies depending on which of the supported cloud instance configurations a customer deploys. An on-premises version of the new platform called CDP Data Center is available now in a preview release for select users. It's due for general release later this year, with annual subscriptions starting at $10,000 per node, Cloudera said.

Next Steps

After sluggish revenues, Cloudera goes private in $5.3B deal

Dig Deeper on Data management strategies