Informatica update targets preparing data for AI development

The data management vendor's latest platform update includes features aimed at enabling customers to effectively ready their data for informing models and applications.

Informatica on Wednesday unveiled its Fall 2024 Release featuring new tools aimed at enabling customers to prepare data for training AI models and applications, including improved data integration capabilities.

Data is the foundation of any AI model or application, providing the AI tool with the intelligence it needs to inform decisions and take actions. The data, however, must be prepared properly for the model or application to be effective.

With poorly prepared data such inaccurate or irrelevant information, the AI tool will deliver incorrect outputs. But with well-prepared, high-quality data, while accuracy is not guaranteed, the likelihood of AI models and applications delivering inaccurate outputs greatly decreases.

Ensuring that data is properly prepared before using it to train AI models and applications is therefore vital, according to Stephen Catanzano, an analyst at TechTarget's Enterprise Strategy Group.

"Data readiness for AI is critical since the outputs from AI are only as good as the data it is trained on," he said. "If you train AI on data that says the world is flat, that is what it will believe."

Proper data preparation for AI includes building trusted data sources with top data quality including accurate, reliable, contextual, governed, current and diverse data, Catanzano continued.

"This is the most crucial step in the process of building a generative AI solution using enterprise data along with the data platform infrastructure to support it," he said.

Kevin Petrie, an analyst at BARC U.S., likewise noted that proper data preparation is a crucial part of AI development. Without governance measures that result in trustworthy data, models and applications won't succeed.

"We've reached the stage in this latest AI innovation cycle in which early adopters realize that, to achieve meaningful production deployments, they need to get serious about data governance," Petrie said. "Powerful models will fail without trusted inputs by making incorrect inferences, generating false or toxic content, and so on."

Based in Redwood City, Calif., Informatica is a data management specialist whose platform, the Intelligent Data Management Cloud (IDMC), enables customers to integrate and prepare data for analysis.

In May, the vendor launched Claire GPT, a generative AI-powered assistant that lets customers use natural language rather than code to work with data, and a low-code/no-code environment for developing generative AI tools. A month earlier, the vendor was a rumored acquisition target of Salesforce before talks fell through amid dissatisfaction from investors.

New capabilities

Fueled by OpenAI's launch of ChatGPT in November 2022, which was a significant improvement in generative AI capabilities, enterprise interest in developing both traditional AI as well as generative AI tools has surged over the past two years.

Because large language models (LLMs) such as ChatGPT and Google Gemini enable true natural language processing, enterprises are aiming to combine their proprietary data with LLM capabilities to enable their employees to work with data using natural language rather than code. With coding skills no longer always needed to work with data, more employees within organizations can use analytics to inform decisions, making decision-making more efficient and accurate.

In addition, because LLMs can be trained with proprietary data to automate repetitive tasks that take up a significant amount of data experts' time, AI tools can make application developers, data engineers, data scientists and other trained experts more efficient.

However, if the proprietary data used to train AI tools isn't prepared properly, the models and applications trained on that data won't perform as intended.

Enterprises often possess massive amounts of data, much of it unstructured such as text, images and audio files that have been loaded into a data lake or some other storage repository and left untouched. Even some structured data such as financial and point-of-sale transaction records is often simply loaded into a data warehouse and left alone.

To get all that data ready to inform AI tools, it needs to properly prepared, according to Gaurav Pathak, Informatica's vice president of product management AI and metadata.

"Many organizations hold terabytes or petabytes of data, both structured and unstructured. But too much of that data has not been properly managed and governed -- it's not what we call AI ready," he said. "Cleaning up messy data will help enterprises prepare data for AI."

Informatica's Fall Release is intended to enable enterprise customers to clean up their messy data. One key component of the update is improved integration capabilities for data stored in Databricks and Google BigQuery, according to Catanzano.

The Fall 2024 Release includes an integration between Informatica's no-code tools and Databricks' generative AI capabilities; an SQL-based data transformation feature that enables users to process extract, load and transform (ELT) pipelines in Databricks Delta Lake and Google BigQuery' and a task wizard that helps guide users as they ingest and replicate data for AI projects.

"Aligning with Databricks is a solid step since they are moving quickly with their [generative AI capabilities]," Catanzano said. "As an intelligent data management platform, Informatica needs to be well integrated wherever their customers are or want to be as a management layer."

Petrie likewise noted the significance of adding ELT pipelines to Delta Lake and BigQuery.

"The ELT enhancements make a lot of sense," he said. "Many data teams now favor ELT … [SS1] [EA2] because they can perform sophisticated transformations on data after ingesting it into platforms such as Databricks and Snowflake."

Informatica's Fall 2024 Release also includes the following:

  • Turbo-charged Application Integration Runtime, a feature scheduled for general availability in November that aims to improve application performance with autoscaling, high throughput and low latency integration capabilities and includes a serverless option.
  • Prebuilt integration templates for integrating data stored in AWS, Microsoft Azure, Google and Oracle, among others.
  • Connectors to various AI development environments such as Amazon Bedrock and Google Vertex AI as well business and messaging applications including Coupa, Salesforce Streaming Events and Azure Service Bus.
  • New master data management features designed to improve workflow integration.
  • Improved data governance capabilities including metadata access controls in Informatica's Cloud Data Governance and Data Catalog.
  • Expanded regional availability of Claire GPT.

Petrie noted that Informatica's update contains a wide range of new and improved features. Perhaps most significant is that they complement one another with a tool such as Turbo-charged Application Integration Runtime targeting the speed and efficiency of integrations enabled by connectors and prebuilt integration templates.

"Informatica's enhancements for application integration make a lot of sense," Petrie said. "To differentiate themselves, AI adopters must optimize the user experience with custom applications based on governed, well-integrated data. Informatica is helping companies do this faster and more efficiently on popular data platforms with its new templates and connectors."

While enabling customers to prepare their data for training AI models and applications is the intent of Informatica's latest update, the impetus for adding capabilities aimed getting data ready for AI came from a combination of customer feedback and the vendor's own research, according to Pathak.

"Customer requirements … are always major drivers, along with our own research and development," he said. "Today, many business and tech leaders are looking to accelerate their GenAI projects and strategic initiatives. We're helping them with these latest innovations." 

Next steps

Although Informatica's Fall 2024 Release focuses on enabling customers to prepare data for training AI models and applications, it does not address the actual development of AI tools other than providing integrations, Catanzano noted.

Last May, Informatica introduced a low-code/no-code environment for developing AI models and applications. Included are drag-and-drop capabilities, customizable templates, prebuilt techniques for generative AI development and support for a variety of LLMs and vector databases.

The vendor's latest platform update includes integrations and connectors with development environments from other vendors such as Databricks and Google but does not include new and improved features for its own development environment.

As a result, Catanzano suggested that Informatica focus some of its future product development and marketing on its own tools for model and application developers.

"I think [Informatica should do] more to get customers to see that their platform is where they should be looking to build GenAI solutions," he said. "They focus on getting your data ready, but I haven't seen much about where to go next."

Eric Avidon is a senior news writer for TechTarget Editorial and a journalist with more than 25 years of experience. He covers analytics and data management.


 [SS1]This OK?

 [EA2]Sure, good with me.

Dig Deeper on Data management strategies

Business Analytics
SearchAWS
Content Management
SearchOracle
SearchSAP
Close