ktsdesign - stock.adobe.com

Cloudera and Hortonworks combo to push CDP, machine learning

Two wunderkinds of Hadoop have formalized their merger. Cloudera and Hortonworks say they will place special focus on AI as they chart the stand-alone vendor's future.

A would-be data management juggernaut got its first public airing as Cloudera -- a combination of formerly separate Hadoop pioneers Cloudera and Hortonworks -- as the newly stand-alone vendor's leaders publicly mapped the road it intends to take forward.

"The combination has made sense for many years," said Tom Reilly, CEO of the combined companies, who held a similar role at the former Cloudera.

Others agreed these leaders in open-source-oriented big data tooling -- built along lines drawn by big web companies, such as Google and Yahoo -- are better together than apart and can offer users a unified big data platform.

Reilly spoke as part of a prerecorded webcast heralding the new company, which came after confirmation that shareholders of Cloudera and Hortonworks had approved a merger of the firms -- a deal first disclosed last October.

Cloudera faces distinct challenges, as it moves data applications to the cloud and tries to convey users to the fast-growing new world of machine learning and AI.

An initial product roadmap depicted in the Jan. 10 webcast indicated that work to combine the Cloudera and Hortonworks platforms is well underway. However, technical details were thin, and executives did not disclose the precise timing of planned product deliveries.

Platform plans

A unified big data technology bundle will be the cornerstone of the company's product strategy, Reilly said during the webcast, which had the feel of an infomercial. Previously referred to by the code name Unity, the new Cloudera Data Platform (CDP) will be "100% open source" and cloud-native, according to Reilly.

Users will be able to run CDP in the AWS, Azure, Google, IBM and Oracle clouds, including in multi-cloud deployments, Reilly said. On-premises and hybrid cloud installations will also be supported.

CDP will blend features from version 3 of the Hortonworks Data Platform (HDP) and version 6 of CDH, the distribution of Hadoop and related technologies from the Cloudera side of the company.

The first version of CDP will provide a set of combined functionality for new users, and it will be followed by a second release that will support upgrades of existing HDP and CDH applications, said Arun Murthy, who led engineering at Hortonworks and is now Cloudera's chief product officer.

The new platform will include a single stack of security and data governance software, Murthy said in the webcast. But he didn't specify whether that will be based on one of the different technology combinations that Cloudera and Hortonworks offered or a combination of the two.

In other technology areas, CDP will retain "some overlap in functionality" from its predecessor platforms to ease migrations for users, Murthy said, again without further details.

Reiterating promises executives made last fall after the merger was announced, Reilly said Cloudera will continue to support HDP 3 and both versions 5 and 6 of CDH for at least another three years. The company will also still add a "steady stream" of new features to the existing platforms during that time, he said.

In addition, Reilly said HDP will be integrated with Cloudera Data Science Workbench (CDSW), a collaboration and workflow management platform for teams of data scientists. He didn't say whether Cloudera will also still offer IBM's rival Data Science Experience workbench software, which Hortonworks has resold since mid-2017.

Another planned integration will link CDH to Hortonworks DataFlow, a real-time data streaming and analytics platform that can pull in data from a variety of systems, IoT devices and other sources.

Containers and the Kubernetes open source container orchestration system will also play a significant role in Cloudera's development strategy, Reilly said. For example, users will be able to deploy CDP in Kubernetes-managed containers. Cloudera Machine Learning, a new machine learning platform that the original Cloudera released in preview mode last month, will similarly run on Kubernetes.

Diagram of Cloudera-Hortonworks plans
Cloudera and Hortonworks have offered a roadmap for the combined company's future.

Touting AI

Also speaking in the webcast, Hilary Mason, general manager of machine learning at Cloudera both before and after the merger, said technologies like Cloudera Machine Learning and CDSW are meant to help analytics teams scale up "from pockets of AI in companies to what looks more like an AI factory."

Given a general industry fascination with AI and machine learning, the fact that Cloudera's proposed product planning heavily emphasizes AI, machine learning and data science technology is not surprising.

What is also not surprising, according to William McKnight, president of McKnight Consulting Group in Plano, Texas, is a seeming emphasis at the new company on Cloudera's brand of data science tools.

"Cloudera was always stronger in AI, while Hortonworks was stronger in IoT," McKnight said.

McKnight laid Hortonworks' lagging here to reliance on Apache Zeppelin tools that have seen only moderate industry uptake. Hortonworks had, in effect, rebooted those workbench efforts in 2017 when, in return for providing a basic Hadoop distribution, it pledged to resell IBM data science tools.

While a recent Hortonworks deal to work on new container technology with IBM was discussed, the earlier deal to resell IBM data science tools was not covered in this webcast.

Multi-cloud happens

Bringing product lines together will help the new company achieve efficiencies.
James Curtisanalyst at 451 Research

Widely cited as a motive behind the Cloudera and Hortonworks merger was growing competition from Hadoop products hosted by Amazon on its AWS cloud. Reilly, in effect, sidestepped the issue of cloud provider competition, as he emphasized Cloudera's intention to be a multi-cloud data management provider.

Reilly said Cloudera's platform runs on Amazon, Microsoft, Google, IBM and Oracle clouds. The multi-cloud support he proffers is meant, in part, to protect users from cloud lock-in, he said.

"Cloudera leaders have touted multi-cloud and hybrid cloud as key to the direction of the company," said McKnight, who added that the advent of microservices, containers and Kubernetes now is allowing users to strike a better balance between cloud providers, which are beginning to allocate resources to different clouds.

Sticking to our story

The basic story coming from the Cloudera and Hortonworks combinations is positive, according to James Curtis, analyst at 451 Research, in an interview before the merger's closing.

"This merger was driven by the idea that the market is consolidating somewhat. It made sense. Both players were oftentimes competing in deals," Curtis said. "But they had a lot more in common than they had differences."

He said the main idea these two Hadoop upstarts shared was "combing open source data components and, where necessary, linking them to proprietary tools."

Curtis indicated duplicate development of new technology initiatives had become a barrier to both Cloudera's and Hortonworks' own corporate fortunes.

"Bringing product lines together will help the new company achieve efficiencies," Curtis said.

It's the end of Hadoop as we know it

Clearly, the Cloudera and Hortonworks merger is something of a watershed for Hadoop, the open source distributed processing framework that was the driving force in the development of big data systems, but is now more than a decade old and lagging in both influence and use, particularly in the cloud.

While Cloudera executives didn't mention Hadoop and related technologies like Spark, Kafka and Zeppelin in the vendor's Cloudera webcast, they'll still be part of CDP -- and the legacy of Hadoop will likely follow Cloudera into the future.

"The two companies are still tied to Hadoop," McKnight cautioned. "As much as much as they'd like to consider it as plumbing, their fortunes are still tied to it."

Senior Executive Editor Craig Stedman contributed to this report.

Dig Deeper on Data management strategies