Definition

What is a large action model (LAM)?

Sean Michael Kerner

By

Sean Michael Kerner

Published: Sep 23, 2024

A large action model (LAM) is an artificial intelligence (AI) system that understands queries and responds by taking action.

An LAM improves on a large language model (LLM), one of the foundational elements of modern generative AI. An LLM such as OpenAI's GPT-4o uses natural language processing (NLP) as a core capability to power ChatGPT. However, while it generates content, it cannot perform actions. The LAM concept moves past this limitation, giving the model the ability to act.

LAMs are designed to execute actions based on user input. They understand and process multiple types of data inputs, as well as human intentions, to perform various operations. LAMs represent a shift from purely language-based models to more interactive and action-oriented systems. They aim to transform AI from a passive tool into an active collaborator capable of executing complex digital tasks.

The term LAM gained notoriety with the debut of the Rabbit R1 device at the Consumer Electronics Show 2024 conference. Rabbit, an AI company, described its product as using a "large action model" to identify and reproduce human actions on various technology interfaces. The R1 is a trainable AI assistant capable of executing user requests, such as making reservations and ordering services.

How does an LAM work?

As a complex form of AI, there are multiple steps in how an LAM operates:

Foundation layer

LAMs often start by integrating a powerful existing LLM. The LLM can be fine-tuned with various data sets for the specific use case of the LAM. The base foundation layer enables the LAM to understand natural language inputs and infer user intent.

Multimodal input processing

LAMs can process multiple types of input, including text, images and potentially user interactions. NLP techniques analyze text inputs, extracting key information and intent.

Goal inference

The LAM analyzes the user's request in context, considering factors such as past behavior and current application state. It uses this analysis to infer the user's true goal, which can extend beyond the literal interpretation of the user's words.

User interface interpretation

LAMs incorporate computer vision capabilities to interpret visual information from application interfaces. They recognize user interface (UI) elements such as buttons, menus and text fields, and they understand the elements' functions within the application.

Task decomposition and action planning

Once the goal is understood, the LAM breaks it into smaller, actionable subtasks. It then formulates a plan, prioritizing actions based on efficiency, user preferences and learned heuristics. This process can involve a knowledge base containing information about real-world applications and common task structures.

Decision-making and reasoning

LAMs employ sophisticated algorithms that use a combination of neural networks and symbolic AI techniques for decision-making. Large action models use neuro-symbolic AI, combining pattern recognition with logical reasoning to determine the best action.

Action execution

LAMs interact with external systems, using tools such as web automation frameworks for web interfaces. LAMs simulate user actions like clicking, typing or navigating between pages. Some LAMs also make application programming interface (API) calls or interact with other software systems directly.

Continuous learning and human oversight

LAMs incorporate various machine learning techniques, including deep learning and reinforcement learning, to improve each interaction. Many LAMs include mechanisms for human oversight, affording intervention in complex scenarios.

What can LAMs do?

A large action model's broad range of capabilities includes the following:

Automation of tasks. LAMs act as advanced AI agents, capable of performing complex tasks autonomously. They navigate user interfaces, fill out forms and interact with software as a human user would.
Integration with external systems. LAMs interact with various external systems and applications, enabling them to navigate websites, execute API calls and manage data across platforms.
Complex decision-making. LAMs reason and make decisions. They assess various action paths, evaluate potential outcomes and choose the most appropriate course of action.
Real-time interaction and adaptation. LAMs adapt to environments and learn from user interactions, enabling them to improve over time.
Enhanced digital interaction. By understanding and executing tasks based on human instructions, LAMs improve human-computer interaction.

What are some common LAM use cases and applications?

Large action models deliver a wide range of potential applications, including the following:

AI assistants. LAMs power more advanced AI assistants that not only understand requests, but also take action to fulfill them.
Customer service. Large action models handle customer inquiries, schedule appointments and process returns.
Marketing and sales. LAMs analyze customer data, create personalized marketing campaigns, and recommend products or services.
Chatbots. LAMs power advanced chatbots that not only engage in conversation, but also perform actions based on user requests.
Process automation. LAMs streamline complex workflows by automating sequences of actions across different applications.
User interface and user experience testing. Because they understand and interact with user interfaces, large action models potentially assist in UI testing or accessibility evaluations.

LAM vs. LLM

There is some similarity and overlap between the concepts of LAM and LLM. The following chart provides an overview of how LAMs compare with their forerunners.

Aspect	LLMs	LAMs
Core functionality	Primarily focused on understanding, generating and manipulating natural language text. LLMs are adept at tasks such as text generation, translation and summarization.	Extend beyond text to include action execution. LAMs are designed to understand instructions and perform complex tasks by interacting with various systems and interfaces.
Data modalities	Specialize in processing textual data, using vast amounts of text to learn language patterns and semantics.	Handle multiple data types, including text, images and possibly other sensory data, enabling them to process and act on a broader spectrum of information.
Action and interaction	Generate text-based outputs and insights, but do not inherently interact with external environments or systems.	Execute actions based on their understanding, such as navigating software interfaces, making API calls or controlling robotic systems.
Feedback and learning	Typically do not incorporate feedback from actions, instead focusing on language tasks without direct environmental interaction.	Use feedback from their actions to refine their performance, affording adaptive learning and continuous improvement in task execution.
Applications	Common applications of LLMs include chatbots, virtual assistants, content creation and language translation.	LAMs are used in applications that require task execution, such as robotic process automation, virtual assistants, customer service automation and complex workflow management.

What are some examples of LAMs?

There are already multiple instances of LAMs in use today, such as those listed below.

Model	Features	Applications
Rabbit R1	Integrates vision tasks and web portal for connecting services and applications. Includes a teach mode for user-guided task demonstration.	Automates various tasks. Interacts with web services. Performs actions based on user instructions.
CogAgent	An open source action model based on the vision language model CogVLM. Generates plans and determines next actions. Provides precise coordinates for graphical user interface (GUI) operations.	Visual question-answering. Optical character recognition on GUI screenshots. Interaction with visual data.
Gorilla	An open source LAM that enables language models to utilize thousands of tools through API calls. Accurately identifies and executes appropriate API calls based on natural language queries.	Invokes more than 1,600 APIs with high accuracy. Suitable for tasks requiring integration with various software tools and services.

Continue Reading About What is a large action model (LAM)?

History and evolution of machine learning: A timeline

Autonomous AI agents: A progress report

Top AI and machine learning trends

The future of AI: What to expect in the next 5 years

Ally's generative AI strategy eyes multiple LLMs, AI agents

Search Networking

What is a bogon?
A bogon is an illegitimate Internet Protocol address that falls into a set of IP addresses that haven't been officially assigned ...
What is signal-to-noise ratio and how is it measured?
A signal-to-noise ratio compares the strength of a desired signal with any undesired signals created by background noise.
What is the OSI model? The 7 layers of OSI explained
The OSI model (Open Systems Interconnection model) is a multilayered reference model that shows how computer systems and ...

Search Security

What is post-quantum cryptography? Comprehensive guide
Post-quantum cryptography, also known as quantum encryption or PQC, is the development of cryptographic systems for classical ...
What is a message authentication code (MAC)? How it works and best practices
A message authentication code (MAC) is a cryptographic checksum applied to a message to guarantee its integrity and authenticity.
What is quantum key distribution (QKD)?
Quantum key distribution (QKD) is a secure communication method for exchanging encryption keys only known between shared parties.

Search CIO

What is a quantum circuit? Quantum vs. classical circuit
Quantum circuits are systems consisting of logic gates that operate on quantum bits (qubits) to process information and perform ...
What is prescriptive analytics?
Prescriptive analytics is a type of data analytics that provides guidance on what should happen next.
What is the Risk Management Framework (RMF)?
The Risk Management Framework (RMF) is a template and guideline organizations use to identify, eliminate and minimize risks.

Search HRSoftware

What is an applicant tracking system (ATS)?
An applicant tracking system (ATS) is software that manages the recruiting and hiring process, including job postings and job ...
What is manager self-service?
Manager self-service is a type of human resource management (HRM) platform that gives supervisors immediate access to employee ...
What is performance management software?
Performance management software is a tool that enables human resources (HR) teams to measure and track the performance of ...

Search Customer Experience

What is field service management (FSM)?
Field service management (FSM) is a system of managing off-site workers and the resources they require to do their jobs ...
What are customer service and support?
Customer service is the support organizations offer to customers before, during and after purchasing a product or service.
What is quality of experience (QoE or QoX)?
Quality of experience (QoE or QoX) is a measure of the overall level of a customer's satisfaction and experience with a product ...

Close