Getty Images/iStockphoto

Apache Hop data orchestration hits open source milestone

The open source technology moves beyond its roots to enable a full data platform as data moves from one source to another for operations, business intelligence and analytics.

The open source Apache Hop data orchestration platform has achieved a big milestone, becoming a Top-Level Project at the Apache Software Foundation.

Hop, a recursive acronym for the Hop Orchestration Platform, first came to the Apache Incubator in September 2020.

The Apache Incubator is often the entry point for technologies into the Apache software foundation (ASF). After a project is able to demonstrate community and technology growth over a period of time, it can be elevated to Top-Level Project status, which signifies a milestone for project maturity.

Hop's roots go back much further than 2020, having been originally based on the Kettle data orchestration project that former data integration and analytics vendor Pentaho made open source in 2012. In 2019, the Hop project started as a fork of Kettle.

Moving Kettle to Hop for data orchestration

Deli Tyres processes data from a variety of sources to feed the web shop's stock systems, receive and place orders, feed the data warehouse and more. Hop is used as the main data processing engine in a combination of real-time streaming and batch processes.
Jan LievensManaging director, Deli Tyres

Among the users of Kettle that migrated to Hop is Belgian car tire wholesaler Deli Tyres. Jan Lievens, managing director of Deli Tyres, said the company had been using Kettle for more than a decade and recently upgraded its entire system from Kettle to Apache Hop.

"Deli Tyres processes data from a variety of sources to feed the web shop's stock systems, receive and place orders, feed the data warehouse and more," Lievens said. "Hop is used as the main data processing engine in a combination of real-time streaming and batch processes."

Among the reasons why Lievens and his team chose to move to Hop is that Hop has a visual development environment that enables faster development and easier maintenance. Lievens said that Hop also provides a smaller resource footprint and is able to handle metadata more efficiently.

"After the upgrade, Hop's smaller footprint and improved metadata management resulted in a system that runs smoother, more transparent and more reliable than was possible before," Lievens said.

Apache Hop data orchestration continuing to mature

The graduation of Apache Hop to the Top-Level Project status at the ASF, made public Jan. 18, means a number of things to Bart Maertens, vice president, Apache Hop, and managing partner at Know.bi, a business intelligence consulting firm.

Maertens said that the new status means Hop has been able to build an active and engaged community.

"We expect the graduation as an Apache Top-Level Project to increase adoption of Hop and grow its community," Maertens said. "As a consequence, we expect more organizations to help out with Hop development and increase the user base, which is expected to lead to an increase in contributions and functionality."

While Hop got its start as a fork of the Kettle project led by Pentaho, Maertens emphasized that the project never had the intention to be compatible with Kettle, and it isn't. He explained that the technical design of Hop is different from Kettle in that Hop now has a kernel and plug-ins architecture, with the engine intended to be as robust and stable as possible, while plug-ins provide added functionality.

"In addition to the revamped architecture, Hop gained a lot of functionality to support data teams in the entire project lifecycle," Maertens said.

Chart showing the Apache Hop Orchestration Platform data architecture
The Hop Orchestration Platform has a data architecture that helps to enable data workflows and pipelines.

The intersection of Hop data orchestration and DataOps

At the core of the Kettle project, and with Hop as well, are ETL (extract, transform and load) capabilities, though Hop can handle more than ETL.

"The Hop platform, implemented according to our best practices, can be used to build and run projects that meet the criteria specified by the 'DataOps Manifesto,'" a set of DataOps principles, Maertens said.

Maertens emphasized that how organizations use and run Hop depends on their perspective.

Hop also focuses on areas outside the purview of DataOps. Those areas include version control and unit and integration testing, as well as integration with CI/CD (continuous integration/continuous delivery) platforms, which apply to DevOps and GitOps principles rather than what is commonly thought of as DataOps.

"More than anything else, Hop intends to be a data platform that not only supports data teams in the development phase, but also provides tools and guidance throughout the entire project lifecycle," Maertens said.

Dig Deeper on Data management strategies