Sergey Nivens - Fotolia
Containers key for Hortonworks alliance on big data hybrid
Hortonworks is joining with Red Hat and IBM to work together on a hybrid big data architecture format that will run using containers both in the cloud and on premises.
NEW YORK -- Hortonworks forged a deal with IBM and Red Hat to produce the Open Hybrid Architecture Initiative. The goal of the Hortonworks alliance is to build a common architecture for big data workloads running both on the cloud and in on-premises data centers.
Central to the new Hortonworks alliance initiative is the use of Kubernetes-managed containers. Such container-based data schemes for the cloud increasingly appear to set the tone for big data architecture in future data centers within organizations.
Hortonworks' deal was discussed as part of the Strata Data Conference here, where computing heavyweight Dell EMC also disclosed an agreement with data container specialist BlueData Software to present users with a reference architecture that brings cloud-style containers on premises.
Big data infrastructure shifts
Both deals indicate changes are afoot in infrastructure for big data. Container-based data schemes for cloud environments are starting to show the way that data will be handled in the future within organizations.
The Hortonworks alliance hybrid initiative -- as well as Dell's and other reference architecture -- reflects changes spurred by the multitude of analytics engines now available to handle data workloads and the growing move of big data applications to the cloud, Gartner analyst Arun Chandrasekaran said in an interview.
"Historically, big data was about coupling compute and storage together. That worked pretty well when MapReduce was the sole engine. Now, there are multiple processing engines working on the same data lake," Chandrasekaran said. "That means, in many cases, customers are thinking about decoupling compute and storage."
De-linking computing and storage
Essentially, modern cloud deployments separate those two elements, Chandrasekaran said. This approach is seeing greater interest in containerizing big data workloads for portability, he noted.
Arun Murthychief product officer and co-founder, Hortonworks
The shifts in architecture toward container orchestration show people want to use their infrastructure more efficiently, Chandrasekaran said.
The Hortonworks alliance with Red Hat and IBM shows a basic change is underway for the Hadoop-style open source distributed data processing framework. Cloud and on-premises architectural schemes are blending.
"We are decoupling storage and compute again," said Arun Murthy, chief product officer and co-founder of Hortonworks, based in Santa Clara, Calif. "As a result, the architecture will be consistent whether processing is on premises or on cloud or on different clouds."
The elastic cloud
This style of architecture pays heed to elastic cloud methods that let users spin up big data clusters when needed for processing jobs and take them offline when they aren't.
"In public cloud, you don't keep the architecture up and running if you don't have to," Murthy said in an interview.
That's compared with how Hadoop systems typically get deployed in the data center, where clusters are often configured and left in place to sit ready for high-peak workloads.
For Lars Herrmann, general manager of the integrated solutions business unit at Red Hat, based in Raleigh, N.C., the Hortonworks alliance project is a step toward bringing in a large class of data applications to run natively on the OpenShift container platform. It's also about deploying applications more quickly.
"The idea of containerization of applications allows organizations to be more agile. It is part of the trend we see of people adopting DevOps methods," Herrmann said.
Supercharging on-premises applications
For its part, Dell EMC sees spinning up data applications more quickly on premises as an important part of the reference architecture it has created with help from BlueData.
"With the container approach, you can deploy different software on demand to different infrastructure," Kevin Gray, director of product marketing at Dell EMC, said in an interview at the Strata conference.
The notion of multi-cloud support for containers is popular, and Hadoop management and deployment software providers are moving to support various clouds. At Strata, BlueData made its EPIC software available on Google Cloud Platform and Microsoft Azure. EPIC cloud support was already available on the AWS cloud.
Big data evolves to singular architecture
Tangible benefits will accrue as big data architecture evolves IT shops to a more singular environment for data processing on the cloud and in the data center, said Mike Matchett, principal consultant and founder of IT industry advisory firm Small World Big Data.
"Platforms need to be built such that they can handle distributed work and deal with distributed data. They will be the same on premises as on the cloud. And, in most cases, they will be hybridized, so the data and the processing can flow back and forth," Matchett said in an interview at the conference.
Some special optimizations will still be required for performance reasons, Matchett added. And IT managers will have to make decisions based on different workloads as to where particular analytics processing will be done, he said.