Definition

What is a Kolmogorov-Arnold Network?

A Kolmogorov-Arnold Network (KAN) is a new neural network architecture that dramatically improves the performance and explainability of physics, mathematics and analytics models. In a business context, KANs show promise in providing insight into new product designs, mathematical models that optimize supply chains, and logistics and scheduling models to improve fleet routing.

The big innovation with KANs is that they show how different ways of constructing the neural networks used in many machine learning techniques and in all generative AI approaches today could be reimagined. Multilayer perceptron (MLP) models underpinning current AI techniques are easy to train in parallel and adapt to new use cases. However, they're not very explainable because the probabilistic logic they discover arises from the weights of thousands or millions of connections between artificial neurons, called features.

In contrast, KANs suggest a practical alternative for incorporating symbolic logic into each neuron in a neural network to improve explainability and transparency. However, they're harder to train because each neuron has to be updated sequentially, as opposed to the parallel updates done using conventional approaches.

KANs are interesting because researchers are already discovering ways to apply them to physics, logistics and scheduling problems. But they could also inspire other researchers to reevaluate the common wisdom that MLPs are the optimal approach for a given problem or use case.

Development of the Kolmogorov-Arnold representation theorem

German mathematician David Hilbert first raised the fundamental idea behind KANs as one of 23 unsolved mathematical problems at the International Congress of Mathematicians in Paris in 1900. His 13th problem asked whether a function consisting of three variables could be expressed by combining the functions of two variables.

In 1957, Vladimir Arnold and Andrey Kolmogorov discovered a more straightforward way to combine multiple equations called the Kolmogorov-Arnold representation theorem. The KAN approach relies on this theorem and is named after them.

Over the years, AI researchers pondered whether the Kolmogorov-Arnold representation theorem might work as part of neural networks. In 1989, MIT computational neuroscientists Federico Girosi and Tomaso A. Poggio wrote a paper suggesting the theorem is irrelevant to neural networks, discouraging further research on the approach. One of the limitations of the approach used by Girosi-Poggio was that it only considered two layers of neural networks, and it was not practical to consider how to extend it to more computers at the time.

In 2024, MIT postgraduate student Ziming Liu revived the idea of using the KAN approach with support from his advisor, physicist Max Tegmark. Liu and Tegmark discovered that properly structured KANs provide dramatic benefits for some kinds of deep learning. They collaborated with several subject matter experts to publish their groundbreaking KANs paper, which has been cited more than 661 times in the 10 months since it was published.

When he started to think about the approach using modern computer infrastructure, Liu also observed the difficulty of extending the Kolmogorov-Arnold representation theorem beyond two layers of neural networks. One challenge was that each neuron has inner and outer functions, and mathematicians have treated them differently. He realized it was possible to unify them through a KAN layer.

"After we have defined the KAN layer, it becomes natural to get deeper networks simply by stacking more KAN layers," he said.

Liu said the Girosi-Poggio paper might have prevented more exploration in this direction. However, Liu approached the problem as a physicist rather than a mathematician. "Maybe because of physicists' optimism and practicality, we were less worried about the negative claim (based on theoretical mathematics, which may have little to do with practice) -- we just build it and see it work in practice," he said.

Kolmogorov-Arnold Networks (KANs) vs. multilayer perceptrons (MLPs)

Kolmogorov-Arnold Networks are a promising alternative to the MLP approaches commonly used in artificial neural networks (ANNs) and deep learning approaches. An important distinction is that KANs start with fundamentally different building blocks representing neurons and their connections.

KANs are hard to train and have not yet demonstrated similar performance gains in applications like large language models (LLMs) or image recognition. They also don't work as well with noisy data. Another big challenge is that KANs can currently only be trained on CPUs running training processes sequentially, as opposed to traditional deep learning approaches that can take advantage of many GPUs running processes in parallel.

MLPs consist of multiple layers of neurons connected to all of the neurons in the adjacent layers. Training them consists of adjusting the weights of activation functions that characterize connectivity strength between neurons across layers.

MLPs are relatively easier to train in parallel because these weights can be adjusted independently of each other. They are also better than KANs at adapting to some noisy data. This reduces the need for data preprocessing to filter out variations in the data. In addition, all the deep learning libraries include extensive support for MLPs, which simplifies the process of adapting them to various use cases.

At a high level, the correlations between multiple variables -- or "smarts" -- in KANs are stored in the mathematical functions in each neuron. In MLPs, they are stored in the weights, characterizing the connectivity between them.

KANs and MLPs are high-level abstractions that characterize architectural approaches using these different building blocks. This distinction is somewhat akin to the types of buildings you might design using various building materials, with each option introducing different benefits and design constraints. With MLPs, a neural architecture search can help find ways to interconnect neurons across multiple layers. With KANs, developers need to fine-tune the mathematical approach in each neuron first, and then fine-tune the interconnections between them, so similarly automated approaches are more limited today.

The following is a breakdown of essential differences between KANs and MLPs:

  • Theorem. KANs use the Kolmogorov-Arnold representation theorem, while MLPs use the universal approximation theorem.
  • Representation. KANs encode information using learnable activation functions on connections that sum operations on neurons. MLPs encode information using fixed functions on nodes and adaptable weights on the connections.
  • Training. In KANs, the connections learn a function that the artificial neurons sum as outputs of those functions. In MLPs, the connections learn weights that are summed up by the neurons.
  • Activation functions. These functions characterize how a neural network calculates the output of a neuron based on inputs and weights. KANs have activation functions on edges, while MLPs have activation functions on nodes.
  • Learnability. KANs have learnable activation functions, while MLPs have fixed activation functions.
  • Goal. KANs are focused on improving science, while LLMs and vision language models (VLMs) focus on language and vision. "I'm not saying eventually KANs cannot do vision/language, or LLMs/VLMs cannot do science, but they have different inductive biases built in that make them good at different things," Liu said.

Building better neural networks

Kolmogorov-Arnold Networks suggest that new ways of representing intelligence across neurons and their connections might dramatically improve performance for various deep learning tasks. Preliminary research suggests KANs can improve results 100 times using 100 times fewer parameters compared with existing approaches for some kinds of problems. This is an important concept, as MLPs are the status quo and haven't been revisited for more than 50 years.

All modern ANN approaches are inspired by animal brains. However, these approaches, including KANs and MLPs, are a crude approximation of what occurs in living organisms. Existing ANN approaches make it easier to train on GPUs and CPUs, but they don't necessarily yield the same results compared to what happens in humans.

Here are a few examples of some of the relative differences between ANNs and animal neurons found in nature:

  • Researchers at Humboldt University of Berlin found that a single dendrite, akin to the connections in an ANN, can carry out the same level of computation as two layers of ANNs.
  • Researchers at the Hebrew University of Jerusalem suggested that a single biological neuron requires an ANN with five to eight layers across multiple neurons at each level to replicate.
  • The orchestrated objective function theory from mathematical physicist Sir Roger Penrose and anesthesiologist Stuart Hameroff suggested that consciousness originates at the quantum level inside neurons, implying that quantum computing-inspired neurons might support further opportunities for innovation.

Improving interpretability

Liu said their original motivation for KANs was to make neural networks more interpretable for science-related tasks. "We want to make KANs sufficiently interpretable such that KANs behave like traditional software which are controllable and interpretable," he said. Current LLMs and other types of neural networks can learn from data but are not understandable by humans.

One important aspect of KANs is that they can be intuitively understood. A visualized KAN diagram enables developers and researchers to see how information is processed. However, humans don't use natural language to communicate with a KAN.

KANs trained on a large data set could distill elegant solutions to complex problems. MLPs and LLMs might learn to recognize patterns but cannot translate these into symbolic representations that are essential for neuro-symbolic AI.

Liu said that KANs were conceived to improve physics modeling, making it easier to incorporate physical knowledge into the KANs and extract insight from them, as they are both neural networks and symbolic equations simultaneously. "No other AI models have been synergized with physics this closely," he said.

Future of KANs

Research into Kolmogorov-Arnold Networks is still in its early stages, but it shows tremendous promise for solving complex scientific and mathematical problems. In the short term, the focus will be on finding different ways to apply KANs to problems in physics, chemistry, biology, business analytics and optimization.

Liu recommends that developers and data scientists interested in exploring KANs review their latest paper on the subject, which is a user guide for identifying relevant features, revealing modular structures and discovering symbolic formulas. A paper on KAN-informed neural networks suggests how KANs can be applied to solve physics problems more efficiently.

One big shortcoming of KANs is that they can currently only be trained on CPUs, which are harder to train in parallel. This is reminiscent of the early days of neural radiance fields, which dramatically improved the ability to capture 3D representations of the world but were slow. Future work might allow KANs to be trained on new hardware or be adapted to run on GPUs to speed up training.

This was last updated in February 2025

Continue Reading About What is a Kolmogorov-Arnold Network?

Dig Deeper on AI infrastructure