Getty Images/iStockphoto
IBM's latest Z mainframe offers lessons in building AI systems
Studying the engineering behind IBM's mainframe architecture could help enterprises build higher reliability into the GPU clusters used to run AI applications.
The latest advances in the IBM Z mainframe provide an architecture to run AI models that enterprises could find helpful as they cobble together off-the-shelf components for their systems.
Next year, IBM plans to start offering the Z mainframe with the second version of the company's Telum processor and a new AI accelerator named Spyre. The chips will work together to power the AI applications of most Fortune 500 companies, including banks, insurers, retailers, carriers, and airlines.
AI applications on the mainframe must keep up with software processing 100,000 transactions per second. IBM has designed the Telum II with eight AI accelerators, each capable of 24 trillion operations per second. That is four times the speed of the first Telum.
To boost Telum performance, IBM offloads some tasks from the chip's CPU to a data processing unit. The DPU processes network and storage traffic so the CPU can handle mostly transactions and database queries.
The Spyre has 32 AI accelerators, so companies can use it to run AI models larger than those on the Telum II, said Christian Jacobi, CTO for Z system architecture and design. The Spyre works concurrently with the Telum II to power the total AI system.
For example, a credit card company would use the AI accelerator in the Tellum II chip to run a typical 100,000- to 1 million parameter machine learning model to look for fraudulent transactions. Questionable transactions will pass through a 100-million parameter model to get a ranking for the possibility of fraud. A transaction that has a high score would get rejected, while those with a low score are processed.
Another example is a healthcare provider using the mainframe for biopsy image analysis to avoid sending sensitive medical information to other systems, Jacobi said. The models can determine the type of cancer and its severity.
"They are categorization models, not generative [AI] models," Jacobi said. "There's a great use here of combining traditional models with the classification type of large language models."
Lessons for enterprises
Enterprises much smaller than the Fortune 500 are unlikely to migrate to the mainframe, which requires expertise in COBOL, a coding language used in few IT systems today. Nevertheless, non-mainframe customers should take note of the system's architecture and the engineering that went into it, Gartner analyst Chirag Dekate said.
The "underappreciated architecture" has the highest level of resilience and reliability, Dekate said. How the mainframe achieves that level of performance could help enterprises using GPU clusters to run AI applications. Those systems often suffer from component and driver failures.
"As enterprise architects, designers and technology leaders look at building AI solutions, I think they might want to pause and double-click on how mainframe architectures are engineered," Dekate said. "Simply putting out GPU farms is not necessarily going to get us there. We're going to have to figure out how to improve the resiliency of our current ecosystems, and there's a thing or two that we can learn from how mainframe ecosystems are architected."
IBM has more than 100 customers in various stages of deploying AI on the mainframe, from proof of concept to testing, Jacobi said. Some are in production.
Mainframe customers' use of AI is much different than that of the average enterprise. They deploy models in their core business operations, not as an add-on to a business process.
"There's a software stack that comes [with the mainframe] that makes it easy with three lines of COBOL to actually tap into an AI model out of a COBOL application," Jacobi said. "You don't get that with off-the-shelf accelerators."
Antone Gonsalves is an editor at large for TechTarget Editorial, reporting on industry trends critical to enterprise tech buyers. He has worked in tech journalism for 25 years and is based in San Francisco. Have a news tip? Please drop him an email.