tostphoto - stock.adobe.com

Effective integration key to creating trusted data

Pipelines and platforms capable of managing volume and combining information from disparate sources in real time are key to the agile decision-making needs of modern organizations.

Proper data integration is essential to delivering trusted data.

Organizations that empower employees to make data-driven decisions have been shown to grow at a faster rate than those that don't similarly enable employees with the tools and knowledge to make their own data-informed decisions.

However, enterprises have struggled for decades to get more workers using data as part of their jobs.

One reason is that analytics tools are complex, making them hard to use. Another is that data literacy programs can be hard to develop and implement, and once they are developed and implemented are time-consuming for workers to take part in.

A third reason is trust or lack thereof.

Even if employees receive the requisite data literacy training and have the technical know-how to use self-service analytics tools, a lack of trust in their organization's data hinders making data-driven decisions.

Organizations, therefore, need to make their data as trustworthy as possible.

Crucial to data quality, which is what leads to trusted data, is proper data integration, according to Dan Pitre, director of product management at data management and integration vendor Actian. He spoke on Feb. 1 during TDWI's Virtual Summit, a conference hosted by the research and advisory firm.

He noted that modern cloud data platforms provide significant data management benefits relative to their predecessors and the on-premises deployments that proliferated before the advent of the cloud. Among them are the flexibility to deploy anywhere, the ability to connect to myriad systems and applications, scalability and performance

None of those benefits matter much, however, if data quality isn't prioritized and the data they manage can't be trusted.

"In the end, cloud data platforms are all about the data, and that data needs to be trusted," Pitre said. "These data platforms need to help you manage that data and feel confident about the quality of that data."

That confidence, meanwhile, comes from knowing that the data being used to feed the models, reports, dashboards and other data products that inform decisions has been properly integrated and treated throughout its lifecycle to guarantee its quality.

A graphic displays different strategies for integrating data.
Major data integration methods.

Considerations

Myriad vendors offer data integration capabilities.

Independent vendors such as Informatica, Fivetran, Qlik and Tibco provide data integration platforms. In addition, tech giants such as AWS, Google, IBM, Microsoft and Oracle all offer data integration tools as part of their larger data management and analytics offerings.

But no matter what data integration platform or platforms an organization deploys, it must take into consideration certain criteria that will result in the data quality that leads to trusted data, according to Pitre.

First is that the data integration platform enables the organization to create a definitive data source for a given piece of information -- what is often termed a single source of truth.

That single source of truth lets data consumers know that the information they're viewing to inform what could be a decision that has material consequences has been treated properly and is endorsed by their organization's data stewards.

"From a trust perspective, it really is about having a single source of truth for your data," Pitre said. "It needs to be fully, fully trusted."

Creating that single source of truth results from proper data integration. That means unifying data from multiple sources, cleaning the data, ensuring that the data within a data set or application is complete, and checking its accuracy.

Flexibility is a second key aspect of an appropriate data integration platform, according to Pitre.

Modern enterprises store much of their data in the cloud and often use more than one vendor's cloud data management capabilities. In addition, most likely still store some data -- particularly their most sensitive data -- in on-premises databases.

Data integration platforms therefore need to connect to any type of deployment to be effective as well as be able to combine disparate data types, including structured, semi-structured and unstructured data.

In addition, they need to be able to connect to various data ingestion tools so data can be captured from all available sources.

Ease of use is a third tenet of modern data integration, according to Pitre. Platforms not only need to connect to all possible sources but also need to do so quicky and easily.

"We live in a hybrid world, and data can reside anywhere," Pitre said.

In addition, not only data engineers but anyone that works with data should be able to work with their organization's data ingestion tools, he continued.

Expertise often lies with business users who better understand data in a given domain than a member of a centralized IT team. As a result, entrusting those domain experts with data management responsibilities can lead to more effective data integration practices and increased trust in the data itself than if only carried out by employees whose expertise is data engineering.

"There are business owners that have true knowledge of what the data is. There are people in IT that are truly technical experts. [And] we need everybody being able to get at that data and move it along to develop that single source of trusted truth," Pitre said.

Challenges

While a list of requirements is easy to compile, carrying out those requirements remains challenging.

Perhaps the most significant problem many organizations face is that there is often no connectivity between the various tools deployed across departments. Instead, data is not only isolated in different departments but sometimes also isolated in different systems within those departments.

For organizations that have been collecting data for perhaps decades, integrating all that data can be onerous. But just as it can be difficult to retroactively integrate an entire organization's historical data, integrating new data as it's collected is also complicated.

In the end, cloud data platforms are all about the data, and that data needs to be trusted. These data platforms need to help you manage that data and feel confident about the quality of that data.
Dan PitreDirector of product management, Actian

Worldwide, data volume is rising exponentially. In addition, as more data types are developed, data is becoming increasingly complex.

The same is true within enterprises. They're collecting more data than ever, and the data they collect is coming in from more sources than before.

That makes data integration difficult no matter how modern and sophisticated a data integration platform might be.

One strategy an organization can use to simplify data integration is to reduce the number of tools it uses to manage and analyze data, Pitre said.

He noted that according to a 2022 Forrester Consulting report, the average large enterprise deploys 367 different software tools and systems across all its various departments.

"So many organizations are drowning it tools," Pitre said. "They have a lot of information, different environments and a lot of different tools. The ability to take that data, integrate it and ensure that there's a single source of trusted truth is an extremely complex problem to solve."

To deal with so much isolated data and such complex data systems, an integration platform that is part of a larger data platform that also addresses such issues as data quality is helpful, he continued. In addition, a platform that supports all types of integration including batch and real-time data streaming while enforcing governance measures that define an organization's data standards is beneficial.

Such comprehensive data platforms can help reduce the complexity of data systems and the cost of data integration.

"That helps improve productivity and insights," Pitre said. "Having it all in one place provides many benefits."

In addition to isolated data, a second significant challenge is the hybrid nature of most enterprises' data systems. Integrating data stored in cloud, on-premises and hybrid databases; data warehouses; and data lakes is a complex undertaking, regardless of the capabilities of a given data integration platform.

Still another obstacle is that it is difficult to implement a system that integrates data in true real time to make it immediately actionable and inform decisions in the precise moment they need to made.

"We're seeing a lot of customers with platforms that don't meet their needs, whether [problems] are around scalability, performance, accuracy of the data or the ability to properly connect to external sources," Pitre said. "The existing environments aren't meeting all the demands that are out there for data and analytics platforms."

Beyond technological challenges, a lack of skills is also holding back many organizations, he continued.

"Skills are always in short supply," Pitre said. "Being able to leverage skills and be able to meet deadlines is always a tough thing."

Finally, from an external perspective, privacy regulations and compliance requirements can be difficult to manage.

Strategic implementation

Ultimately, Pitre said that when developing a data integration infrastructure, organizations need to consider four primary factors:

  • The domains that require integration.
  • The personas that will use the tools.
  • The type of deployment required.
  • The connectivity of the implementation.

Domains refer to different integration patterns such as data synchronization, loading data into a warehouse or lake, the ability to handle disparate data types, and joining applications.

"All of these integration types need to be handled so you don't miss any data," Pitre said. "Then you will have a complete set of data."

Addressing different personas, meanwhile, refers to ease of use so that more than trained data experts can work with data integration platforms.

When business users are able to work with their organization's data, agility and growth result.

"Reducing complexity, creating an environment where business analysts can dig into the data, manipulate it, connect to sources and bring it into a warehouse is key," Pitre said.

Deployment is about flexibility -- being able to work in any cloud, hybrid or on-premises environment.

Finally, connectivity is key. Data integration pipelines need to connect to any potential data source to ingest data as well as connect to any potential model or application, including those that don't yet exist, to fuel real-time decisions.

"These … criteria are critically needed to enable organizations to have trusted data," Pitre said. "Trusted data leads to smarter insights and better business decisions, and in the end, a more successful organization."

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.

Dig Deeper on Data management strategies