How to reap the benefits of data integration, step by step

A new book lays out a strong case for data integration and guides readers in how to carry out this essential process.

In a new book just out from Technics Publications, data experts Bill Inmon, Patty Haines and David Rapien tackle the essential (but too-often avoided) task of data integration. They delve into the benefits of data integration and explore specific categories of integration, creating an essential guide for all who wish to make the best use of their organization's data.

This selection, from an early chapter, tackles the critical question of why data integration is so essential for organizations of all types.

What are the essential processes that take place within a data warehouse? Possibly the most important is the integration of data. Integration provides the organization with the same unified view of data. Unfortunately, despite the benefit of integration, nobody wants to do the integration. Everyone hates integration, including vendors, data analysts, data scientists, and consultants. So why do people hate data integration so much?

People hate integration because integration requires thought and work -- a lot of thought and a lot of work. There are no shortcuts.

A typical vendor tactic is to provide users with a platform and then put the responsibility of integration on these users. Often this process is called Extract, Load, and Transform (ELT). And what happens when ELT is set into motion? Due to ELT complexity, often integration conveniently does not get done.

Integrating Data book coverTo learn more about this
new book from Technics,
click the book cover.

E and L get done, but we forget about T.

The problem is that you must integrate the data to have a truly unified view of data across the enterprise. There are no shortcuts. There are no easy paths out.

So what do you end up with when you do not integrate data? You have a world of silos of information. One silo cannot communicate or cooperate with another silo. Data exists solely within its silo and you cannot use it anywhere else. You have no way of looking at information across the enterprise.

We must integrate all sorts of data: structured, transaction-based, and textual data. There is a lot of important data within the organization that we overlook today. And that is a shame because organizations are missing a great opportunity. Organizations need to look at all of their data, not just the data that is convenient to use.

There have been many unsuccessful trends in avoiding data integration:

  • Just build a data mart from applications using the dimensional model. Who needs all the work of building a data warehouse? Go directly from application to data mart and skip all data integration work using dimensional modeling.
  • Let's change the definition of a data warehouse. And in doing so, not do integration because it is hard and complex.
  • Let's do ELT rather than ETL. Only let's skip the T part of the equation.
  • Let's copy operational data into a separate platform and call it 'integrated data.' That is a lot simpler than getting in there and unifying the data.
  • Let's bring in big data. I heard that with big data we didn't really have to integrate our data. The vendor told us that if we just put all of our data into big data, we would not need a data warehouse.
  • Let's do a data mesh. Who needs all the complications of integration?
  • Let's just put everything in a data lake. Then people can go to the data lake and just find what they want. That's all there is to it.

Every day, vendors create more excuses not to do integration. And every day, the problems with siloed systems grow worse.

If you want to create corporate, believable data out of your corporation, you must integrate the data out of your silos.

The first step to not fearing integration is to understand it.

We integrate data at three levels:

The techniques for data integration are very different for classical structured data versus textual data. Integration with structured data involves a data model, whereas integration with text involves a taxonomy and other mappings, ontologies, and inline contextualization. And these different tools that support integration are very different from each other.

There are (at least) two important aspects to integration. The first aspect is the mechanics of integration, and the second aspect is the project management of integration. This book covers both of these aspects.

Vendors and consultants fear integration. The first step to not fearing integration is to understand it. Once you understand integration, you can rationally start to plan how to do integration.

Want to learn more? Read the rest of Chapter 1 by downloading this PDF.

Dig Deeper on Data integration