Definition

What is an autoregressive model?

An autoregressive model is a category of machine learning models in which algorithms predict future data based on a series of their own past data. The core ideas date back to the early 1900s when an astronomer used this model to predict sunspot activity. The term autoregressive (AR) builds on the notion of regression in machine learning, which captures the relationship between independent and dependent variables to predict an outcome.

Autoregressive techniques have become important tools for various types of predictive analytics over the years. Recently, researchers have discovered that they can be combined with other types of algorithms, such as convolutional neural networks (CNN), recurrent neural networks (RNNs) and transformers, for various generative AI (GenAI) applications and use cases.

How do autoregressive models work?

Autoregressive models automate the process of capturing relationships across one or more variables in a sequence to identify patterns that can be used to predict future values. The variables can be simple numbers, called scalars, representing a single dimension, or they can be a combination of numbers representing multiple dimensions together, called vectors.

For example, an XY chart represents each point as a 2D vector; an object's location in physical space could be represented as a 3D vector. More complex vectors, sometimes consisting of hundreds of dimensions, are often used in generative AI to represent the semantic similarity between words or tokens.

AR models employ a variation of linear regression used to predict the value of a dependent variable based on a combination of dependent variables. However, in the case of AR, the model predicts the value of the predicted variable based on the lagged value of the output variable as predictors. For example, a model trained on English might learn that the word "the" is often followed by nouns and hence predict a word like "fox" after the sequence "the quick brown."

The following are some key concepts when working with autoregressive models.

Lagged values. These values represent previous data points or observations in a sequence, such as yesterday's temperature, sales numbers or earlier words in a sentence being generated. For example, in GenAI text generation, an autoregressive model provides a framework for creating the next word in a sequence by using lagged values -- previous words, sentences and paragraphs -- to build on the context of the generated content.

Coefficients. These numerical values capture the influence of previous values in making a fine-tuned prediction during the AR training process.

Order of the model. This refers to the number of previous readings -- i.e., lagged values -- used in the model, represented as AR(1) for one previous value, AR(2) for two previous values and so on.

Autocorrelation. This function measures the likelihood that a new value is based on past values and is used to determine if an AR model is suitable. It also can help determine the appropriate AR number for suitable results.

Examples of autoregressive models

Autoregressive models can take advantage of various ways of correlating previous data points in a sequence or organizing the representations of these data points for different use cases. Here are some examples.

Simple autoregressive model. The fastest and simplest approach is to simply predict the next value, vector or token in a sequence based on the previous value.

Multi-order autoregressive models. These models predict the next value using multiple previous values, called orders. The approach can be more accurate but requires more time for computer processing. For example, a model for generating text could improve by considering the previous words in the sentence or all the words in the last paragraph as well.

Autoregressive integrated moving average. ARIMA models combine information about the dependence on past values (the AR component) and the relationship with past errors (the MA component). The integrated part helps transform constantly changing data into a more suitable form for autoregressive techniques. 

Vector autoregression. VAR models help make predictions based on multiple sequences or data series. They were pioneered for use cases in economic analysis to help identify trends based on different economic and financial time-series data. They can also be used in multimodal AI applications to generate content informed by the various modalities of data, such as text, speech and video.

Multiscale embedding fusion. These models make predictions by representing content at different scales, such as at the level of words, sentences or paragraphs that are each captured into vectors. In generative AI, this enables the algorithm to take advantage of more of the previous context while using fewer AR values. For example, the model could predict the next word based on the last three words in the sentence, earlier sentences in the paragraph and previous paragraphs rather than the dozens of previous words that would cause AR values to be lower.

Benefits and challenges of autoregressive models

Autoregressive models are a good option for predictive analytics when the future is highly correlated with the past for certain data streams. It is also a good fit for GenAI cases when a large language model has done a good job of matching its vector embedding scheme to the question at hand. However, AR models also present many challenges that can sometimes be mitigated with complementary techniques.

Benefits

Simplicity and interpretability. Autoregressive models are relatively straightforward to understand and interpret.

Effective for stationary time series data. AR models are good for stationary time data, which means that statistical properties such as the mean and variance don't change much over time.

Good forecasting performance. These types of models can provide accurate forecasts when past values strongly correlate with the future.

Wide availability. Most statistical and machine learning packages make implementing basic and more advanced AR models for various use cases easy.

Challenges

Stationary assumptions. AR models are not directly suited for use cases when the statistical properties of a data stream are constantly changing. Techniques such as ARIMA can bridge this gap for some, but not all, data.

Long-term dependencies. The time and processing power required to consider each previous data point or token can increase significantly. Techniques such as multiscale embedding fusion can help address this.

Sensitivity to model order. Variations in the model order or number of previous tokens used in a prediction can affect performance differently. Underfitting can occur when too few previous data points are considered. Overfitting can occur when too many previous points are included since the AR model also captures noise.

Difficulty capturing complex interactions. A simple autoregressive model can sometimes include multiple variables in making predictions but might miss intricate dependencies between them. Techniques such as VAR can help, but these can add other complexities.

Autoregressive vs. non-autoregressive models

Autoregressive models are sometimes compared with non-autoregressive models (NAR), which can generate a response in parallel for faster but less accurate results. For example, NAR models can improve the speed of language translation or real-time captioning but can introduce more errors or hallucinations during the process.

AR models employ a sequential process in which each new data point is predicted, or each token is generated based on the preceding ones. This step-by-step process improves their ability to take advantage of the context more granularly.

In contrast, NAR models generate an entire sequence in parallel. This lets them use multiple processors or compute cores to speed up response time. This is analogous to seeing the whole picture at once rather than piecing it together one piece at a time. NAR models are often trained on high-performing AR models using knowledge distillation and source-to-target alignment techniques.

Both AR and NAR models take advantage of different kinds of vector embeddings to predict and thus generate new tokens that represent text, audio, video and protein structures in new content or data.

How are autoregressive models used in generative AI?

AR models are widely used under the covers of various GenAI algorithms for generating text, audio, video and other types of time series data. In these cases, rather than predicting simple values as in statistics, they predict the appropriate vector combinations for generating a token translated into a word, pixel or sound.

They can also take advantage of multimodal vector embeddings that capture relationships across different modalities of data, such as text and speech or images and captions:

  • Text generation. AR techniques are essential components in all large language models used for answering questions and summarizing content. In these use cases, they are implemented as part of the decoding aspects of a generative pretrained transformer to predict the most appropriate word in a response based on the vector embeddings previously used to encode training data and question text into tokens. 
  • Music generation. A transformer trained on audio can capture information about the notes, rhythm, volume and pitch into vector embeddings representing style, tone, emotion or genre and correlate these with text descriptions. On the decoding end, the autoencoder generates each note based on the previous ones using examples of music and descriptions in the query.
  • Image infilling. Models such as PixelCNN and PixelRNN use CNNs and RNNs to capture different aspects of an image, such as colors, edges and objects, into vector encodings. An autoregressive model can apply these encodings to predict or fill in missing pixels in a picture. A similar process can be applied to increase the resolution of an existing image by extending the dimensions of a low-resolution picture to a higher one and then using autoregression to fill in the gaps between pixels in the large image.
  • Synthetic world data. Nvidia uses autoregression algorithms as part of its Cosmos World Foundation model platform. It analyzes a few existing video frames and uses the autoregressive model to generate future ones based on preceding ones. The model can also map existing videos to different conditions, such as driving at night or in snow.
This was last updated in February 2025

Continue Reading About What is an autoregressive model?

Dig Deeper on AI technologies