What is AI red teaming? Beyond AI doomerism: Navigating hype vs. reality in AI risk
X

Foundation models explained: Everything you need to know

Foundation models are large-scale, adaptable AI models reshaping enterprise AI. They hold promise, but face risks such as biases, security breaches and environmental impacts.

Foundation models will form the basis of generative AI's future in the enterprise.

Large language models (LLMs) fall into a category called foundation models. Language models take language input and generate synthesized output. Foundation models work with multiple data types. They are multimodal, meaning they work in other modes besides language.

This enables businesses to draw new connections across data types and expand the range of tasks that AI can be used for. As a starting point, a company can use foundation models to create custom generative AI models, using a tool such as LangChain, with features tailored to its use case.

The GPT-n (generative pre-trained transformer) class of LLMs has become a prime example of this. The release of powerful LLMs such as OpenAI's GPT-4 spurred discussions of artificial general intelligence -- basically, saying that AI can do anything. Since their release, numerous applications powered by GPTs have been created.

GPT-4 and other foundation models are trained on a broad corpus of unlabeled data and can be adapted to many tasks.

What is a foundation model?

Foundation models are a new paradigm in AI system development. AI was previously trained on task-specific data to perform a narrow range of functions.

A foundation model is a large-scale machine learning model trained on a broad data set that can be adapted and fine-tuned for a wide variety of applications and downstream tasks. Foundation models are known for their generality and adaptability.

GPT-4, Dall-E 2 and BERT -- which stands for Bidirectional Encoder Representations from Transformers -- are all foundation models. The term was coined by authors at the Stanford Center for Research on Foundation Models and the Stanford Institute for Human-Centered Artificial Intelligence (HAI) in a 2021 paper called "On the Opportunities and Risks of Foundation Models."

The authors of the paper stated: "While many of the iconic foundation models at the time of writing are language models, the term language model is simply too narrow for our purpose: as we describe, the scope of foundation models goes well beyond language."

The name foundation model underscores the fundamental incompleteness of the models, according to the paper. They are the foundation for specific spinoff models that are trained to accomplish a narrower, more specialized set of tasks. The authors of the Stanford HAI paper stated: "We also chose the term 'foundation' to connote the significance of architectural stability, safety, and security: poorly-constructed foundations are a recipe for disaster and well-executed foundations are a reliable bedrock for future applications."

How are foundation models used?

Foundation models serve as the base for more specific applications. A business can take a foundation model, train it on its own data, and fine-tune it to a specific task or a set of domain-specific tasks.

Several platforms, including Amazon SageMaker, IBM Watsonx, Google Cloud Vertex AI and Microsoft Azure AI, provide organizations with a service for building, training and deploying AI models.

For example, an organization could use one of these platforms to take a model from Hugging Face, train the model on its proprietary data and use prompt engineering to fine-tune the model. Hugging Face is an open source repository of many LLMs, like a GitHub for AI. It provides tools that enable users to build, train and deploy machine learning models.

How do foundation models work?

Foundation models use predictive algorithms to "learn" a pattern and generate the next item in that pattern. The algorithms that foundation models use can vary, including transformer-based architectures, variational encoders and generative adversarial networks.

A foundation model, applied to text, learns common patterns in that text and predicts the next word based on existing patterns in the text and any additional input a user might provide. A foundation model applied to video learns underlying patterns in a database of videos and generates new videos that adhere to those patterns. Foundation models are generative AI programs; they learn from existing corpuses of content to produce new content.

There are three broad steps underlying foundation models' functionality:

  1. Pretraining. The foundation model learns patterns from a large data set.
  2. Fine-tuning. The model is fine-tuned for specific tasks with smaller, domain-specific data sets.
  3. Implementation. The model is ready to receive new data as input and generate predictions about that data based on patterns learned in pretraining and fine-tuning.

Foundation models are expensive to train and run. The compute hardware underlying foundation models usually consists of multiple parallel GPUs.

Importance of foundation models

Foundation models are important because of their adaptability. Instead of training specialized models from the ground up for a narrow set of tasks, engineers can use pretrained foundation models to develop new applications for their specific use case.

Despite the energy and compute costs of developing, training and maintaining foundation models, their ability to scale predictably and set the basis for downstream AI applications makes them a worthy investment for some organizations with the necessary resources.

Characteristics of foundation models

The main traits of foundation models include the following:

  • Scale. To make foundation models powerful, there are three ingredients that enable scale for foundation models:
    1. Hardware improvements. GPUs, which power foundation models' chips, have significantly increased throughput and memory.
    2. Transformer model architecture. Transformers are the machine learning model architecture that powers many language models, such as BERT and GPT-4. Transformers are not the only model architecture present in foundation models, but are a common option.
    3. Data availability. There is a lot of data for these models to train on and learn from. Foundation models need large quantities of unstructured data to train.
  • Traditional training. Foundation models use traditional machine learning training methods, such as a combination of unsupervised and supervised learning, or reinforcement learning from human feedback.
  • Transfer learning. By using knowledge learned from one task and applying it to another, models use transfer learning on surrogate tasks and then fine-tune to a specific one. Pretraining is the type of transfer learning used in the GPT-n series of language models.
  • Emergence. Model behavior is induced rather than explicitly constructed. The model produces results that are not directly related to any one mechanism in the model.
  • Homogenization. Homogenization means a wide range of applications could be powered by a single generic learning algorithm. The same underlying method is used in many domains. The Stanford Institute HAI paper stated that almost all state-of-the-art natural language processing (NLP) models are adapted from one of only a few foundation models.

Examples of foundation model applications

Foundation models are fine-tuned to create apps. Below are a few examples of foundation models and the applications they underlie.

  • GPT-n series. GPT-3 and GPT-4 have become the basis for many applications in the short time they've been around, with ChatGPT being the most notable. A paper from researchers at OpenAI, OpenResearch and the University of Pennsylvania posited that GPTs -- the AI model -- exhibit qualities of general-purpose technologies. General-purpose technologies, such as the steam engine, printing press and GPTs, are characterized by widespread proliferation, continuous improvement and the generation of complementary innovations. These complementary technologies can work with, support or build on top of general-purpose technologies like GPTs. The paper's findings showed that with access to an LLM, about 15% of all worker tasks in the U.S. could be completed significantly faster at the same level of quality.
  • Florence. Another example of a foundation model is Microsoft's Florence, which is used to provide production-ready computer vision services in Azure AI Vision. The application uses the model to analyze images, read text and detect faces with prebuilt image tagging.
  • Swedish LLM. Sweden is attempting to build a foundational LLM for all major languages in the Nordic region: Danish, Swedish, Icelandic, Norwegian and Faroese. It would be used primarily by the public sector. The Swedish consortium running the project has gained access to the supercomputer Berzelius, along with hardware and software help from Nvidia. The model is still in development, but early versions are available on Hugging Face.
  • Claude. Anthropic's Claude series of foundation models -- which includes Haiku, Sonnet and Opus -- displays proficiency in coding and can be fine-tuned to a wide range of tasks. Anthropic developed Claude as a constitutional AI, meaning safety and reliability are top priorities in the development of the model.

Opportunities and challenges of foundation models

Foundation models are multimodal because they have multiple capabilities, including language, audio and vision.

Because of their general adaptability, foundation models could provide numerous opportunities and use cases in a variety of different industries, including the following:

  • Healthcare. In this industry, foundation models show promise for generative tasks, such as drug discovery. An IBM foundation model -- Controlled Generation of Molecules, better known as CogMol -- was able to generate a set of new COVID-19 antivirals using a common architecture called a variational autoencoder. IBM's MoLFormer-XL is another foundation model currently being used by Moderna to design messenger RNA medicines.
  • Law. Law uses generative tasks that foundation models could help with. However, they currently lack the reasoning ability to generate truthful documents. If they could be developed to show provenance and guarantee factuality, then they would be beneficial in the legal field.
  • Education. Education is a complex domain that requires nuanced human interaction to understand students' goals and learning styles. There are many individual data streams in education that together are too limited to train foundation models. Still, foundation models could be broadly applicable to generative tasks, such as problem generation.

Despite their broad potential, foundation models pose many challenges, including the following:

  • Bias. Because foundation models stem from only a core few technologies, inherent biases due to social or moral issues in those few models might spread through every AI application.
  • System limitations. Computer systems are a key bottleneck for scaling model size and data quantity. Training foundation models might require a prohibitively large amount of memory. The training is expensive and computationally intensive.
  • Data availability. Foundation models need access to large amounts of training data to function. If that data is cut off or restricted, they don't have the fuel to function.
  • Security. Foundation models represent a single point of failure, which makes them a viable target for cyberattackers.
  • Environmental impact. It takes a large environmental toll to train and run large foundation models like GPT-4.
  • Emergence. The outcomes of foundation models can be difficult to trace back to a particular step in the creation process.

Other important AI research papers

"On the Opportunities and Risks of Foundation Models" is just one of the influential research papers about foundation models. AI research is being published at a significant clip. Here are some other foundational AI research papers to know about:

  • "Attention Is All You Need." This paper introduced the transformer architecture, which became a new standard in AI systems using NLP.
  • "BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding." This paper introduced BERT, which became a widely used language model for pretraining.
  • "Language Models Are Few-Shot Learners." This paper introduced GPT-3, which laid the groundwork for ChatGPT. GPT-3 could perform a wide range of NLP tasks with little to no task-specific training.
  • "Dall-E: Creating Images From Text." This paper was the basis of Dall-E, an AI system that generates images from natural language input.

Ben Lutkevich is the site editor for Software Quality. Previously, he wrote definitions and features for WhatIs.com.

Next Steps

Generative AI vs. machine learning: Key differences and use cases

Successful generative AI examples and tools worth noting

Conversational AI vs. generative AI: What's the difference?

Assessing different types of generative AI applications

Top generative AI tool categories

Dig Deeper on Artificial intelligence