Blue Planet Studio - stock.adobe

Tip

AI data governance is a requirement, not a luxury

Data governance is the backbone of any data initiative. Properly managing the new data risks and regulations of AI technologies requires the addition of AI data governance.

Traditional data governance policies often fail to consider how today's AI systems use -- and generate -- data. As more enterprises integrate AI models and tools into their workflows, governance strategies must adapt to ensure compliance, mitigate risk and maintain data integrity.

Data governance, which initially arose to support big data and analytics, plays a key role in ensuring compliance with privacy and security requirements for customer applications. Many organizations' existing data governance processes and tools should already accomplish this task. But as AI, machine learning and generative AI tools that incorporate retrieval-augmented generation (RAG) find footholds in the enterprise, organizations must update their approach to data governance.

AI data governance is critical to the success of enterprise AI projects and essential for regulatory compliance. Many organizations aspire to develop and deploy AI applications that adhere to responsible AI principles; AI data governance is also necessary to achieve this goal. And AI data governance improves decision-making by enabling stakeholders to check training data quality and monitor model drift post-deployment.

How AI affects data governance programs

Ideally, data governance policies should apply across all types of systems, which include the following:

  • Transaction systems.
  • Systems of record.
  • Analytics systems, including real-time analytics.
  • AI-driven technologies, including machine learning, deep learning and generative AI.

But, in many enterprises, data governance and AI teams evolved separately, creating siloes. The data governance team might not be well-versed in AI and machine learning project requirements, such as ensuring that model performance does not degrade over time and detecting unwanted biases.

AI governance must cover the data used in AI models and how that data is managed, deployed and monitored post-deployment. It includes unique considerations such as checking for biases in training data and adhering to responsible AI principles.

Ideally, AI governance should fall under the data governance umbrella and not be treated as a separate function. Where that is not possible due to teams being siloed, there should be open lines of communication between the teams responsible for AI projects and data governance. Because the data governance team usually has an end-to-end view of the organization's data assets, team members might need training on AI lifecycles to understand how AI affects data privacy and compliance.

Addressing AI considerations within existing data governance frameworks can be challenging. For example, consider RAG applications such as enterprise chatbots: How should a user's data access and entitlement rules affect the information that the chatbot retrieves to generate responses? What guardrails should be applied to the large language model (LLM)? How should new software components like vector databases fit into data governance?

Generative AI applications also require careful consideration of copyrights. Ensuring that the data used to train LLMs is legitimate requires documenting that information as part of data governance. Similarly, pre-deployment tests should verify that AI-generated outputs do not violate copyrights.

Data privacy laws also apply to the AI prompts users enter when using third-party, cloud-hosted LLMs. And newer security threats, such as prompt injection, can bypass data access and protection controls.

Tools of the trade

AI projects require new data infrastructure components. Consider a large organization where different teams want to build machine learning applications based on a common data infrastructure. An enterprise feature store can enforce data policies and access controls at the feature or column level. Organizations must also be able to support versioning of models, features and data sets used in AI.

Organizations might also need an AI governance tool -- an emerging and maturing product category. These tools serve as central repositories for AI metadata, often including AI registries that provide a unified view of all AI applications within an organization. Some AI governance tools offer collaboration and reporting features, AI application documentation capabilities, and policy integration with MLOps tools.

Generative AI applications require yet another set of infrastructure components. For example, RAG-based applications, such as semantic search and enterprise chatbots or LLMs, need a vector database: a newer type of data store that organizes content as vectors. There are specialist vector database products, although some traditional database vendors also support vector storage.

Traditional data governance tools provide data quality management, metadata management and data cataloging capabilities. However, they don't yet fully support the functionality that AI and machine learning projects require, suggesting the need for a tool with AI-specific governance capabilities.

How to implement AI governance

Pragmatically, it might make sense to have two roadmaps for implementing AI data governance: one for the near term and one for the long term.

In the near term, organizations should implement the best structure and processes they can, using existing tools and human resources. Start by assessing the skills and resources currently available for data and AI governance, and use those resources to create a RACI chart defining roles and responsibilities. Organizations should design interconnected processes to ensure they meet new data governance requirements.

As part of the long-term roadmap, organizations must decide whether they require a full-fledged AI governance product. Factors such as the extent of AI adoption, the number of AI use cases, and industry regulatory requirements affect tool choice.

Kashyap Kompella, founder of RPA2AI Research, is an AI industry analyst and advisor to leading companies across the U.S., Europe and the Asia-Pacific region. Kashyap is the co-author of three books, Practical Artificial Intelligence, Artificial Intelligence for Lawyers and AI Governance and Regulation.

Dig Deeper on Data governance