Getty Images

AI model training drives up demand for high-bandwidth memory

Developed over a decade ago, high-bandwidth memory is experiencing a sharp uptick in popularity due to the demand for high-end GPUs from chipmakers like Nvidia.

Graphics processing units have practically become a household term as they've grown in popularity due to AI. But a lesser-known technology, one that works in conjunction with GPUs, is also having a moment: high-bandwidth memory.

HBM is a high-density memory product designed to overcome memory bottlenecks and achieve the maximum rate at which data is transferred from storage to processor. In demand by AI chipmakers such as Nvidia, its higher bandwidth and location directly next to the GPU's processor chip challenges the performance of more traditional memory technologies such as server RAM, which sits between storage and the processing unit. HBM also uses less power than other memory technologies, which could benefit AI model training and GPU environments, given their reputation for consuming energy.

But HBM's rise in popularity will likely falter as the market shifts from AI model training to AI inferencing, where traditional technology might be more cost-effective. In a 2023 forecast analysis, Gartner said the use of accelerator chips that integrate HBM for AI model training will decline from 65% in 2022 to a projected 30% in 2027.

How HBM is different

HBM is like other types of memory technologies, including graphics double data rate (GDDR), used to deliver high bandwidth for graphic-intensive applications. But it differs from these technologies in some notable ways.

While HBM and GDDR both use DRAM chips, HBM is positioned differently on the GPU. GDDR DRAM is typically positioned on the printed circuit board in a GPU card design, whereas HBM sits next to the processor itself. This close location to the processor gives HBM its main advantage -- speed, according to Jim Handy, general director and semiconductor analyst at Objective Analysis.

"The big culprit is that long lines -- the high number of interconnections -- add all kinds of capacitance to the signal line, and so slow [the signals] down a lot," Handy said.

Aside from its position relative to the processor, HBM consists of DRAM chips that are stacked for density instead of placed side by side on the GPU card like GDDR. This stacking architecture is a more arduous undertaking, according to Shrish Pant, an analyst at Gartner and the author of the firm's HBM forecast report. For starters, HBM is made for the highest-performance use cases and uses the latest memory technology nodes, which are in lower supply due to demand.

The other difference is the size of the chip, or die, of DRAM. "You would use the same die, but what will happen is you will need a much bigger die to produce a similar gigabyte of HBM chip," Pant said.

The larger die is needed due to the process of through-silicon via (TSV), Pant said. Holes are drilled through the chips to provide room for thin electrical wires that connect the DRAM chips to one another and then to a logic chip at the bottom of the stack, which manages the data transfer function.

TSV is not a common connectivity method for other chips, which tend to use wire bonding, Handy said. It adds costs to manufacture the chips and requires that HBM be more than twice as large to accommodate this process.

"The wafer costs more and it produces less than half to half as many chips," he said.

Jeff Janukowicz, an analyst at IDC, echoed the point, saying that HBM costs more to make than server DRAM and takes longer to make. And since it needs more wafers allocated to it, HBM yield loss is greater.

"HBM will have several packages, so if I have yield loss at the DRAM level, then a little bit of yield loss at the packaging level, rather than throwing away one chip, I could be throwing away four," he said.

A diagram of the TSV electrical connection for DRAM chips in HBM.
SK Hynix uses TSV to tie its HBM together on a processor.

AI's need for speed

During earnings calls in June and July, all three of the major suppliers of HBM -- SK Hynix, Samsung and Micron -- highlighted demand for HBM and noted that they either have expanded or will expand production. Micron said during its third-quarter 2024 earnings that its HBM is sold out through calendar year 2025. In July, as part of its second-quarter 2024 earnings, SK Hynix reported to investors that "HBM sales went up by more than 80% compared to the previous quarter and more than 250% compared to the same period last year."

Shortly before SK Hynix's earnings, analyst firm TrendForce released a prediction that the demand for high-density products and the higher price of HBM will help to push the memory industry into record revenues in 2025. This includes other memory products such as DRAM.

"Compared to general DRAM, HBM not only boosts bit demand but also raises the industry's average price. HBM is expected to contribute 5% of DRAM bit shipments and 20% of revenue in 2024," the TrendForce report said.

The demand for HBM is driven by the demand for high-end GPUs, especially for chipmaker Nvidia, and the desire for speed in AI model training -- especially among hyperscalers that are looking to turn AI into a moneymaker, according to Handy.

"They'll say, 'Twice as many inferences for half again as much money' -- that's good math," Handy said.

The lack of the HBM bandwagon

Initially, HBM was designed to solve memory bottleneck issues in high-performance computing. It was developed as a collaboration, with SK Hynix bringing the first HBM chip to market in 2013. It was adopted as an industry standard that same year by the Joint Electron Device Engineering Council, a group that sets standards for microelectronics.

Today, SK Hynix, one of the largest memory-makers in the world, is the top producer of the current generation of HBM, HBM3, which it sells to Nvidia, Pant said. Nvidia has the largest GPU market share by revenue, according to a 2023 generative AI market report from IoT Analytics, a German analysis firm.

"[SK Hynix] is the top producer for the latest generation of HBM, and Nvidia buys the latest generation of HBM and is the first customer for the latest generation," Pant said.

Nobody, not even Nvidia, thought that the AI market for GPUs was going to explode the way that it has.
Jim HandyGeneral director and semiconductor analyst, Objective Analysis

SK Hynix beat out Samsung, the world's largest memory-maker, and Micron, the third-largest memory-maker, by pursuing HBM more aggressively for HPC, according to Handy, who tracks the market closely. To do so, SK Hynix invested a large amount on upfront costs, not knowing of its importance to AI until years later.

"Nobody, not even Nvidia, thought that the AI market for GPUs was going to explode the way that it has," Handy said.

Samsung and Micron initially went in a different direction and in 2011 codeveloped hybrid memory cube technology for supercomputers that also stacked memory chips and used TSV. Micron supported the technology for a few years before moving support away to focus on technologies such as HBM.

When HBM was first introduced in 2013, it was a niche product, Janukowicz said. SK Hynix's decision to focus on it became a "right place, right time" kind of story. SK Hynix's HBM has been qualified by GPU-makers as well, meaning that it meets the demands without issue.

"SK Hynix ships the vast majority of the volume today. But both Micron as well as Samsung do have products that are being qualified," he said.

HBM's future

Janukowicz believes the rise in popularity for HBM will come to an end as the AI market shifts from model training to inferencing.

"We're in a phase where there is a lot of [AI] model development, so that's clearly driving a pretty big uplift in terms of HBM in the near term," he said.

Pant agreed, saying that inferencing will rely on other forms of memory and could potentially create an influx of more custom-designed inference chips.

"Inference workloads are comparatively lesser-intensive workloads, but you need a higher number of servers or number of chips to run those workloads," Pant said.

Future sales of HBM is hard to determine and will likely fluctuate the same way as any other technology based on demand and competition, Handy said.

"If [hyperscalers] decide to keep on buying the highest-end GPUs, then they're going to have HBM all the time," he said. "But if they at some point say, 'Hey, we're buying things that are far more capable than we really need,' they'll back off."

As far as the technology is concerned, experts believe it will follow a similar pattern to other forms of memory, like DRAM, where vendors increasingly strive for higher bandwidth and greater density. This can be seen with HBM3E now in production and HBM4 set for a 2026 release, with bandwidth increasing from 1 TBps up to 1.4 TBps in the next release.

There are also physical hurdles to scaling up HBM, including Nvidia's GPU design, according to Handy. Nvidia's GPUs are four-sided chips, where two can be dedicated to HBM while the other two are dedicated to I/O and power. Increasing the amount of HBM per unit might require a GPU redesign, but that's a ways off.

"It's not the power that's the limiting factor for HBM, like server memory, but it's just how much beachfront you have," Handy said.

Adam Armstrong is a TechTarget Editorial news writer covering file and block storage hardware and private clouds. He previously worked at StorageReview.

Dig Deeper on Flash memory and storage