Tech Accelerator What is GenAI? Generative AI explained

Prev Next

Feature

Foundation models explained: Everything you need to know

Foundation models are large-scale, adaptable AI models reshaping enterprise AI. They hold promise, but face risks such as biases, security breaches and environmental impacts.

Ben Lutkevich

By

Ben Lutkevich, Site Editor

Published: 06 Jan 2025

Foundation models will form the basis of generative AI's future in the enterprise.

Large language models (LLMs) fall into a category called foundation models. Language models take language input and generate synthesized output. Foundation models work with multiple data types. They are multimodal, meaning they work in other modes besides language.

This enables businesses to draw new connections across data types and expand the range of tasks that AI can be used for. As a starting point, a company can use foundation models to create custom generative AI models, using a tool such as LangChain, with features tailored to its use case.

The GPT-n (generative pre-trained transformer) class of LLMs has become a prime example of this. The release of powerful LLMs such as OpenAI's GPT-4 spurred discussions of artificial general intelligence -- basically, saying that AI can do anything. Since their release, numerous applications powered by GPTs have been created.

GPT-4 and other foundation models are trained on a broad corpus of unlabeled data and can be adapted to many tasks.

What is a foundation model?

Foundation models are a new paradigm in AI system development. AI was previously trained on task-specific data to perform a narrow range of functions.

This article is part of

What is GenAI? Generative AI explained

Which also includes:
8 top generative AI tool categories for 2025
Will AI replace jobs? 17 job types that might be affected
25 of the best large language models in 2025

A foundation model is a large-scale machine learning model trained on a broad data set that can be adapted and fine-tuned for a wide variety of applications and downstream tasks. Foundation models are known for their generality and adaptability.

GPT-4, Dall-E 2 and BERT -- which stands for Bidirectional Encoder Representations from Transformers -- are all foundation models. The term was coined by authors at the Stanford Center for Research on Foundation Models and the Stanford Institute for Human-Centered Artificial Intelligence (HAI) in a 2021 paper called "On the Opportunities and Risks of Foundation Models."

The authors of the paper stated: "While many of the iconic foundation models at the time of writing are language models, the term language model is simply too narrow for our purpose: as we describe, the scope of foundation models goes well beyond language."

The name foundation model underscores the fundamental incompleteness of the models, according to the paper. They are the foundation for specific spinoff models that are trained to accomplish a narrower, more specialized set of tasks. The authors of the Stanford HAI paper stated: "We also chose the term 'foundation' to connote the significance of architectural stability, safety, and security: poorly-constructed foundations are a recipe for disaster and well-executed foundations are a reliable bedrock for future applications."

How are foundation models used?

Foundation models serve as the base for more specific applications. A business can take a foundation model, train it on its own data, and fine-tune it to a specific task or a set of domain-specific tasks.

Several platforms, including Amazon SageMaker, IBM Watsonx, Google Cloud Vertex AI and Microsoft Azure AI, provide organizations with a service for building, training and deploying AI models.

For example, an organization could use one of these platforms to take a model from Hugging Face, train the model on its proprietary data and use prompt engineering to fine-tune the model. Hugging Face is an open source repository of many LLMs, like a GitHub for AI. It provides tools that enable users to build, train and deploy machine learning models.

How do foundation models work?

Foundation models use predictive algorithms to "learn" a pattern and generate the next item in that pattern. The algorithms that foundation models use can vary, including transformer-based architectures, variational encoders and generative adversarial networks.

A foundation model, applied to text, learns common patterns in that text and predicts the next word based on existing patterns in the text and any additional input a user might provide. A foundation model applied to video learns underlying patterns in a database of videos and generates new videos that adhere to those patterns. Foundation models are generative AI programs; they learn from existing corpuses of content to produce new content.

There are three broad steps underlying foundation models' functionality:

Pretraining. The foundation model learns patterns from a large data set.
Fine-tuning. The model is fine-tuned for specific tasks with smaller, domain-specific data sets.
Implementation. The model is ready to receive new data as input and generate predictions about that data based on patterns learned in pretraining and fine-tuning.

Foundation models are expensive to train and run. The compute hardware underlying foundation models usually consists of multiple parallel GPUs.

Importance of foundation models

Foundation models are important because of their adaptability. Instead of training specialized models from the ground up for a narrow set of tasks, engineers can use pretrained foundation models to develop new applications for their specific use case.

Despite the energy and compute costs of developing, training and maintaining foundation models, their ability to scale predictably and set the basis for downstream AI applications makes them a worthy investment for some organizations with the necessary resources.

Characteristics of foundation models

The main traits of foundation models include the following:

Scale. To make foundation models powerful, there are three ingredients that enable scale for foundation models:

Hardware improvements. GPUs, which power foundation models' chips, have significantly increased throughput and memory.
Transformer model architecture. Transformers are the machine learning model architecture that powers many language models, such as BERT and GPT-4. Transformers are not the only model architecture present in foundation models, but are a common option.
Data availability. There is a lot of data for these models to train on and learn from. Foundation models need large quantities of unstructured data to train.

Traditional training. Foundation models use traditional machine learning training methods, such as a combination of unsupervised and supervised learning, or reinforcement learning from human feedback.
Transfer learning. By using knowledge learned from one task and applying it to another, models use transfer learning on surrogate tasks and then fine-tune to a specific one. Pretraining is the type of transfer learning used in the GPT-n series of language models.
Emergence. Model behavior is induced rather than explicitly constructed. The model produces results that are not directly related to any one mechanism in the model.
Homogenization. Homogenization means a wide range of applications could be powered by a single generic learning algorithm. The same underlying method is used in many domains. The Stanford Institute HAI paper stated that almost all state-of-the-art natural language processing (NLP) models are adapted from one of only a few foundation models.

Examples of foundation model applications

Foundation models are fine-tuned to create apps. Below are a few examples of foundation models and the applications they underlie.

GPT-n series. GPT-3 and GPT-4 have become the basis for many applications in the short time they've been around, with ChatGPT being the most notable. A paper from researchers at OpenAI, OpenResearch and the University of Pennsylvania posited that GPTs -- the AI model -- exhibit qualities of general-purpose technologies. General-purpose technologies, such as the steam engine, printing press and GPTs, are characterized by widespread proliferation, continuous improvement and the generation of complementary innovations. These complementary technologies can work with, support or build on top of general-purpose technologies like GPTs. The paper's findings showed that with access to an LLM, about 15% of all worker tasks in the U.S. could be completed significantly faster at the same level of quality.
Florence. Another example of a foundation model is Microsoft's Florence, which is used to provide production-ready computer vision services in Azure AI Vision. The application uses the model to analyze images, read text and detect faces with prebuilt image tagging.
Swedish LLM. Sweden is attempting to build a foundational LLM for all major languages in the Nordic region: Danish, Swedish, Icelandic, Norwegian and Faroese. It would be used primarily by the public sector. The Swedish consortium running the project has gained access to the supercomputer Berzelius, along with hardware and software help from Nvidia. The model is still in development, but early versions are available on Hugging Face.
Claude. Anthropic's Claude series of foundation models -- which includes Haiku, Sonnet and Opus -- displays proficiency in coding and can be fine-tuned to a wide range of tasks. Anthropic developed Claude as a constitutional AI, meaning safety and reliability are top priorities in the development of the model.

Opportunities and challenges of foundation models

Foundation models are multimodal because they have multiple capabilities, including language, audio and vision.

Because of their general adaptability, foundation models could provide numerous opportunities and use cases in a variety of different industries, including the following:

Healthcare. In this industry, foundation models show promise for generative tasks, such as drug discovery. An IBM foundation model -- Controlled Generation of Molecules, better known as CogMol -- was able to generate a set of new COVID-19 antivirals using a common architecture called a variational autoencoder. IBM's MoLFormer-XL is another foundation model currently being used by Moderna to design messenger RNA medicines.
Law. Law uses generative tasks that foundation models could help with. However, they currently lack the reasoning ability to generate truthful documents. If they could be developed to show provenance and guarantee factuality, then they would be beneficial in the legal field.
Education. Education is a complex domain that requires nuanced human interaction to understand students' goals and learning styles. There are many individual data streams in education that together are too limited to train foundation models. Still, foundation models could be broadly applicable to generative tasks, such as problem generation.

Despite their broad potential, foundation models pose many challenges, including the following:

Bias. Because foundation models stem from only a core few technologies, inherent biases due to social or moral issues in those few models might spread through every AI application.
System limitations. Computer systems are a key bottleneck for scaling model size and data quantity. Training foundation models might require a prohibitively large amount of memory. The training is expensive and computationally intensive.
Data availability. Foundation models need access to large amounts of training data to function. If that data is cut off or restricted, they don't have the fuel to function.
Security. Foundation models represent a single point of failure, which makes them a viable target for cyberattackers.
Environmental impact. It takes a large environmental toll to train and run large foundation models like GPT-4.
Emergence. The outcomes of foundation models can be difficult to trace back to a particular step in the creation process.

Other important AI research papers

"On the Opportunities and Risks of Foundation Models" is just one of the influential research papers about foundation models. AI research is being published at a significant clip. Here are some other foundational AI research papers to know about:

"Attention Is All You Need." This paper introduced the transformer architecture, which became a new standard in AI systems using NLP.
"BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding." This paper introduced BERT, which became a widely used language model for pretraining.
"Language Models Are Few-Shot Learners." This paper introduced GPT-3, which laid the groundwork for ChatGPT. GPT-3 could perform a wide range of NLP tasks with little to no task-specific training.
"Dall-E: Creating Images From Text." This paper was the basis of Dall-E, an AI system that generates images from natural language input.

Ben Lutkevich is the site editor for Software Quality. Previously, he wrote definitions and features for WhatIs.com.

Next Steps

Generative AI vs. machine learning: Key differences and use cases

Successful generative AI examples and tools worth noting

Conversational AI vs. generative AI: What's the difference?

Assessing different types of generative AI applications

Top generative AI tool categories

Dig Deeper on Artificial intelligence

Search Networking

What is Border Gateway Protocol (BGP)?
BGP (Border Gateway Protocol) is the protocol that enables the internet's global routing system.
What is multiplexing and how does it work?
Multiplexing, or 'muxing,' is a way of sending multiple signals or streams of information over a communications link at the same ...
What is a programmable network (network programmability)?
A programmable network is one in which software that operates independently of network hardware handles the behavior of network ...

Search Security

What is a side-channel attack?
A side-channel attack is a cybersecurity exploit that aims to gather information from or influence a system's program execution. ...
What is a hacker?
A hacker is an individual who uses computer, networking or other skills to overcome a technical problem.
What is a web application firewall (WAF)? WAF explained
A web application firewall (WAF) is a firewall that is meant to protect web applications against common web-based threats.

Search CIO

What is a chief technology officer (CTO)?
A chief technology officer (CTO) is a high-level executive who is responsible for overseeing an organization's strategic use of ...
What are soft skills?
Soft skills are personal attributes that support situational awareness and enhance an individual's ability to get a job done.
What is digital disruption?
Digital disruption is the change that occurs when new digital technologies and business models affect the value proposition of ...

Search HRSoftware

What is employee experience?
Employee experience is a worker's perception of the organization they work for during their tenure.
What are performance appraisals? A how-to guide for managers
A performance appraisal is the structured practice of regularly reviewing an employee's job performance.
What is gamification? How it works and how to use it
Gamification is a strategy that integrates entertaining and immersive gaming elements into nongame contexts to enhance engagement...

Search Customer Experience

What is quality of experience (QoE or QoX)?
Quality of experience (QoE or QoX) is a measure of the overall level of a customer's satisfaction and experience with a product ...
What is voice of the customer? A guide to VOC Strategy
Voice of the customer (VOC) is the component of customer experience (CX) that focuses on customer needs, wants, expectations and ...
What is high-touch customer service?
High-touch customer service is a category of contact center interaction that requires human interaction.

Close