An overview of the Pentaho Data Integration platform

The Pentaho Data Integration platform enables organizations to integrate, blend, convert and transform data from any data source across their entire enterprise.

The Pentaho Data Integration platform provides the extract, transform and load functionality necessary to integrate a wide variety of data sources, including relational databases, enterprise applications, files and big data.

The platform's ETL architecture supports the creation and maintenance of target databases such as data warehouses, data marts and data lakes. This product provides the data integration portion of the Pentaho Business Analytics platform, which also includes data preparation and governance capabilities. Pentaho Data Integration can be used alone or in conjunction with these tools.

The latest version of Pentaho Data Integration, 6.1, offers the following:

  • Provides a graphical ETL designer, which enables data integration teams to design, test and deploy integration processes, workflows, notifications and alerts.
  • Enables connectivity to a wide variety of relational databases, big data stores, files and enterprise applications as either sources or targets in integration tasks.
  • Provides an extensive library of prebuilt data integration transformations that support complex process workflows.
  • Offers repository-based development tools that manage the design, creation, testing, deployment, and operation of integration processes and supporting metadata.
  • Enables users to visualize data during data preparation and to publish metadata models to its analytics tools. 

This version also gives users the ability to convert data transformations into data services that allow the query results from these services to be analyzed as virtual data tables. Added support for data lineage analysis gives users the ability to analyze the end-to-end flow of data across Pentaho Data Integration transformations and jobs. This latest version also offers improved big data capabilities by supporting Cloudera Distribution for Hadoop and connecting to the Hadoop cluster using Spoon.

A new Simple Network Management Protocol plug-in enables organizations to integrate with third-party tools to monitor data integration events. This version also includes an SAP HANA bulk loader plug-in that enables users to bulk load data into their SAP HANA databases.

Who benefits from using the Pentaho Data Integration platform?

SMBs as well as large enterprises use the product to provide a comprehensive and cohesive data integration and business analytics platform. In addition to direct sales, Pentaho has an embedded OEM network, enabling those vendors to extend their products with data integration and analytics capabilities.

In addition to its commercial versions, Pentaho offers an open source version of its Data Integration product known as Kettle. Many enterprises initially start working with the Kettle open source tool to explore integration capabilities or for limited integration workloads.

How is Pentaho Data Integration licensed and priced?

The commercial version of Pentaho Data Integration is available in three editions -- Basic, Professional and Enterprise -- and runs on Windows, Linux and Mac OS X operating systems. It's sold as an annual subscription that includes support services. Contact Pentaho for pricing information.

A 30-day, fully functional trial of Pentaho Data Integration is available as part of the Pentaho Business Analytics package or as a standalone product.

The Kettle open source version of Pentaho Data Integration is available along with other products for analytics, including Report Designer, Aggregation Designer, Schema Workbench and Metadata Editor. The Kettle Data Integration open source version provides a subset of data integration capabilities, whereas the commercial versions of Pentaho Data Integration offer expanded transformations, repository-based management and team-based development functionality.

Next Steps

How the data integration tools market is growing

Which BI tools match your needs?

Designing a relational DBMS for data modelling 

Dig Deeper on Data integration