putilov_denis - stock.adobe.com

Generative AI shines spotlight on data governance and trust

Generative AI creates new opportunities for how organizations use data. Strong data governance is necessary to build trust in the data AI models use.

The advent of generative AI ushered in a new era of technical advancement, promising to transform industries and how we consume data. Data governance is crucial to securing the quality and integrity of the data fueling AI systems.

According to a June 2024 study from TechTarget's Enterprise Strategy Group, "Data Governance in the Age of AI," 70% of organizations said they prioritize data quality and integrity in their AI-driven initiatives. The heightened focus underscores the undeniable link between strong data governance and the success of AI projects.

Only 46% of organizations expressed moderate confidence in the accuracy of data presented to end users for decision-making. This figure highlights that organizations understand the importance of data quality but struggle to translate awareness into concrete actions that ensure data trust. It's a barrier organizations need to overcome as they build internal and customer-facing generative AI tools. GenAI tools, including infrastructure such as databases, governance tools, machine learning and analytics, can all help build more trust in the enterprise data used in generative AI use cases.

Why has the role of data governance become so critical in the age of AI? The answer lies in the nature of AI systems. Organizations building GenAI-powered applications should start by defining a use case, such as a GenAI-powered knowledge base where employees and customers can get company and product answers quickly. This starts with a data foundation -- the enterprise data -- which might be product catalogs, training documents and support data. This data is processed into a vector-enabled database, using techniques such as retrieval-augmented generation and embeddings from a large language model or foundational model, such as OpenAI's GPT, Google's Gemini or a front-end chatbot. This lets users ask questions and receive answers in natural language responses based on the specific enterprise data foundation. This example demonstrates the critical importance of data quality, accuracy, compliance and control of the enterprise data being used for the generative AI application. The quality and representativeness of data directly affects the accuracy, fairness and reliability of the generative AI tool.

Consider the implications of biased or inaccurate data: An AI system trained on poor data is likely to perpetuate existing biases, leading to discriminatory outcomes. For example, an AI algorithm using outdated information would provide inaccurate information on pricing, features and functionality. Or, if confidential information is not scrubbed from the data, it could be released. As organizations collect and process increasing amounts of data from diverse sources, the potential for errors, inconsistencies and privacy breaches grows exponentially. Without robust data governance in place, organizations risk exposing themselves to significant financial, reputational and legal liabilities.

To mitigate these risks and unlock the full potential of AI, organizations must prioritize data governance as a core element of their AI strategies. They should implement comprehensive frameworks that address data quality, security, privacy and accessibility. Key components of a strong data governance program include the following:

  • Data quality management. Ensures data accuracy, completeness, consistency and timeliness through data cleansing, validation and profiling.
  • Data security. Protects sensitive data from unauthorized access, use, disclosure, disruption, modification or destruction.
  • Data privacy. Ensures compliance with privacy regulations and safeguards individual rights through data minimization, anonymization and encryption.
  • Data accessibility. Makes data readily available to authorized users while maintaining appropriate controls to prevent misuse.
  • Data governance framework. Establishes clear roles, responsibilities and processes for data management, including data ownership, stewardship and accountability.

By investing in data governance, organizations can build trust in the generative AI tools they create, enhance decision-making and mitigate risks. Generative AI has the potential to transform how we consume information, but every organization has the responsibility to build trusted products, which starts with strong data governance.

Stephen Catanzano is a senior analyst at TechTarget's Enterprise Strategy Group, where he covers data management and analytics.

Enterprise Strategy Group is a division of TechTarget. Its analysts have business relationships with technology vendors.

Dig Deeper on Data governance