Getty Images/iStockphoto

GPUs vs. TPUs vs. NPUs: Comparing AI hardware options

Traditional CPUs struggle with complex ML and AI tasks, leading to today's specialized processors -- GPUs, TPUs and NPUs, each tailored to handle specific functions efficiently.

One significant development in modern computing is the diversification of processing and processor chips used in machine learning and AI.

Traditional computing centered on processors as the essential foundation of computer architecture. General processors were expected to do it all, but designers quickly found that adding millions more transistors for every need wasn't good for efficiency, price or power consumption. Instead, designers have embraced the notion of purpose-built processors designed to handle a limited number of specific compute tasks with extremely high performance and power efficiency. An early example of this specialization is the adoption of RISC processors, such as the Arm architecture.

However, RISC is just a simplification of general central processing units (CPUs). The demands of modern computing have spawned new breeds of processing units intended to operate in conjunction with general-purpose CPUs by offloading specific tasks that the CPU cannot handle alone.

The graphics processing unit (GPU) was the first real example of this coprocessing paradigm. It was introduced to perform much of the math and data manipulation needed for tasks such as computer graphics and professional digital visualization, including computer-aided design. Although GPUs are traditionally found to have limited use in enterprise computing, the introduction of machine learning (ML) and AI workloads has made GPUs popular for handling the exceptionally high volumes of complex math and data processing these tasks require.

There are other emerging examples of processing units intended specifically to support ML and AI applications in the enterprise. One important example is the tensor processing unit (TPU), which specializes in math tasks, such as matrix multiplication. An even more powerful example is the neural processing unit (NPU), which mimics the neural networks of a human brain to accelerate AI tasks.

Here is a closer look at the roles of GPUs, TPUs and NPUs in current ML and AI environments.

Understanding the role of CPUs and GPUs in AI

CPUs form the backbone of every computer, and GPUs are increasingly common in enterprise computing. However, both play a vital role in ML and AI.

CPU

A modern CPU can perform hundreds of different general operations. Still, those many operations typically fall into four broad and traditional categories: data movement, simple math, fundamental logic, and hardware control and housekeeping.

For example, a CPU is well suited to running a word processor; opening and saving documents to storage; interacting with human interface devices, such as a keyboard and mouse; and producing output to a display device.

In more complex computing, with additional processing units, the CPU acts as an organizer. It can launch and run software, such as a computer game. However, when the CPU encounters a task it can't easily handle, it can pass to supporting processors and then integrate the output with the software's normal operations.

CPUs can participate in ML and AI tasks. The CPU is a primary organizer, coordinator and controller of system hardware, making it well suited for handling ML and AI software. A CPU can perform many complex mathematical operations involved with ML and AI. The problem is that CPUs are designed to operate efficiently in a limited number of instruction pipelines. CPUs are not designed for high-level parallel task execution, and their performance suffers dramatically when asked to perform high volumes of complex tasks simultaneously.

General-purpose CPUs launch and run ML and AI software, but their direct work on ML and AI tasks is typically limited to several use cases, such as the following:

  • Relatively simple ML and AI tasks involving small data sets and simple training schema, such as few-shot or one-shot training.
  • Tasks with high memory requirements, such as inference and training systems.
  • ML algorithms that do not support parallelization, such as real-time inference algorithms.
  • Tasks that require sequential data, such as recurrent neural networks.
  • Models that involve large data samples, such as 3D data.

GPU

The GPU is a specialized device intended to handle a narrow set of tasks at an enormous scale. While a CPU has a few large, general-purpose cores, a GPU has hundreds or thousands of small, specialized cores designed for specific mathematical or logical tasks and the necessary control systems to manage these tasks. Thus, a GPU provides an extremely high level of parallelism in its mathematical operations, such as matrix and vector computations.

Most ML and AI systems require enormous data sets for training. The mathematical operations that must be performed on that data during training and production can easily overwhelm a general-purpose CPU. Consequently, GPUs have emerged as a key component of many ML and AI hardware platforms where parallel processing is required to accelerate ML using larger data volumes in less time. GPUs are used in various types of ML and AI applications, including the following:

  • The development, training and operation of neural networks.
  • Any AI or deep learning tasks involving enormous volumes of parallel data processing, including big data analytics or pharmaceutical and biological research.
  • Most everyday ML tasks or AI applications that train and infer, such as image processing.

It's important to note that the massive parallelism that makes GPUs powerful can also be a drawback. ML and AI software designed for GPU support must be carefully designed to minimize complexity. GPUs demand well-structured and direct tasks. Complex programming, such as branching logic or sequential operations, can reduce the GPU's efficiency and cost-effectiveness.

What are TPUs and how do they work?

TPUs are another type of hardware device or chip intended to handle vast volumes of parallel mathematical tasks typically involved in ML and AI workloads. TPUs are frequently called application-specific integrated circuits (ASICs), meaning the chips are designed and manufactured to perform a narrow scope of intended tasks. Different types of ASICs are used in various devices, such as home alarm clocks, coffee makers and dedicated controllers in enterprise-class storage subsystems.

TPUs were first used in 2016 when Google developed the chips internally for Google's TensorFlow open source ML software framework. Today, TPUs also support other frameworks, including PyTorch and JAX. TPUs are designed with a large quantity of basic math cores called matrix multiply units, which are interconnected and each capable of performing common math tasks at low-to-modest precision but with exceptionally high speed and power efficiency. This makes TPUs cost-effective and well suited for ML and AI tasks, such as the following:

  • Training, tuning and inference involving large and complex ML models.
  • Training large, complex deep learning models requiring matrix calculations, such as large language models.
  • Projects involving deep learning and neural networks that attempt to approximate the behaviors of the human brain, such as synthetic speech, recommendation engines, computer vision, healthcare diagnostics and diagnoses, and genetic and pharmaceutical research.

TPUs are similar to GPUs but offer even greater specialization and parallelism. As with GPUs, simply having TPUs isn't enough; effective use depends on the underlying software, such as TensorFlow, which provides the necessary instructions and code architecture.

TPUs are primarily employed through Google Cloud as Google Cloud TPUs. These are available as a cloud service and can be used by developers building ML and AI platforms on Google Cloud.

What are NPUs and how do they work?

NPUs are another highly specialized type of ASIC designed to accelerate specific ML and AI tasks that rely on inference instead of rote training. In simple terms, inference is a conclusion based on evidence and reasoning. Brains learn -- and effective decision-making develops -- through a phenomenon called synaptic weight. When information is passed between neurons in the brain, the pathways between those neurons become better and faster, and they use less energy.

For example, you're driving a car and heading for home. You recognize roads, signs and buildings along the way and make effective decisions about your location and corresponding directions to complete the journey. You don't need to check or follow your GPS because you already "know" where you are. That's the role and benefit of inference.

AI can also learn by prioritizing frequently encountered data, enabling quicker results with less energy consumption. That is, AI can identify several potential decisions or outcomes and evaluate other possible decisions weighted against learning (experience) and real-time input (vision, hearing and location data) to make the best decision. NPUs focus on supporting this activity and are designed to accelerate the computations involved in neural networks.

Where GPUs and TPUs are typically deployed and used centrally, NPUs are often implemented as specialized cores in larger general processors or supporting devices in edge or mobile equipment. For example, the latest smartphones might employ an NPU to accelerate facial recognition features or other biometric input for device and data security or run natural language processing for speech recognition or real-time language translation tasks.

NPUs vs. TPUs vs. GPUs: Ideal AI use cases for each

ML and AI require large amounts of data for training and real-time processing, even with advanced methods, such as one-shot training. This data undergoes complex math and logic operations to convert "human" data into actionable elements for an ML or AI system.

For example, consider a computer vision system designed to recognize cats. The model must perform complex math on a cat image to convert its pixels into relevant data points, enabling it to identify the animal and its type. It takes thousands of different images to train the model. Processing every image takes math, and each mathematical operation takes time and energy. The quicker and more efficiently a model completes training and processing, the more energy-, time- and cost-efficient it becomes for the business.

The key question is: Which processing unit best suits the mathematical tasks of a given project? The answer is that there are few ideal use cases in AI processing. Here are some highlights for each processing unit:

  • A general-purpose CPU can handle the math needed for ML and AI, but its architecture limits performance in these tasks, leading to outcomes that may require significant time and resources. CPUs typically only handle data processing when the project does not lend itself well to the scale provided by other processing units. For example, CPUs can handle one-shot or few-shot supplemental training.
  • GPUs can perform specific math tasks, such as matrix multiplication, far more efficiently than CPUs, quickly handling most ML and AI workloads. They are often chosen for their availability, cost-effectiveness and easy integration into modern servers. GPU-based computing and software development are typically the simplest and most suitable options for businesses developing ML and AI in-house.
  • TPUs are, fundamentally, an extended type of GPU and can also perform math tasks, such as matrix multiplication, needed by ML and AI. TPUs are typically used by businesses building ML and AI systems on Google Cloud, where TPU hardware and TensorFlow software are available as Google Cloud services. TPUs are generally faster and less precise than GPUs, which is usually acceptable for most ML and AI math tasks. However, cloud users aren't limited to TPUs and can choose GPU-based hardware and services from Google and other public cloud providers.
  • NPUs use a different architecture designed for better performance in processing frequently encountered data, similar to human thought and information association. They are commonly found in edge and mobile endpoint devices, such as smart devices, where ML and AI applications benefit from significantly faster processing. Consequently, NPUs are typically not intended for data center or cloud environments.

Stephen J. Bigelow, senior technology editor at TechTarget, has more than 20 years of technical writing experience in the PC and technology industry.

Dig Deeper on Artificial intelligence