OpenShift AI boosts LLMOps chops with Neural Magic deal

The acquisition of a top contributor to an open source library already linked to OpenShift AI comes as LLMOps fundamentally alters the platform engineering scene.

SALT LAKE CITY -- Red Hat will buy a top contributor to a key LLMOps utility used by OpenShift AI that supports self-hosted large language models on standard hardware.

The deal, for an undisclosed sum, was publicized this week during KubeCon + CloudNativeCon North America as users of the company's internal developer platform gathered at a co-located OpenShift Commons event. Neural Magic, based in Somerville, Mass., specializes in advanced techniques for optimizing the LLMs that underpin generative AI applications to perform well in a variety of IT infrastructure environments. The company was founded in 2018 by an MIT professor and researcher with the goal of decoupling generative AI applications from expensive and often hard-to-find GPU hardware.

Neural Magic's focus on widening infrastructure support for LLMs is in keeping with both Red Hat's hybrid cloud strategy for its developer platforms and commitment earlier this year to support open source AI models, according to company officials.

"We see the future of AI being accelerated through open [source]," Red Hat CTO Chris Wright said during a press conference Tuesday. "Our goal is to build this scalable, trainable AI infrastructure that allows our customers to deploy their workloads, train their workloads and deploy inferencing anywhere that makes sense for their business."

Neural Magic employs two of the Top 10 contributors to the vLLM project, described on its GitHub page as "a high-throughput and memory efficient inference and serving engine for LLMs." The vLLM library has shipped as part of Red Hat's RHEL AI and OpenShift AI project since midyear.

Within OpenShift AI, vLLM functions similarly to a traditional web application runtime server but is optimized to run an LLM, according to Derek Carr, senior distinguished engineer at Red Hat, in an interview with TechTarget Editorial at OpenShift Commons.

"In a traditional Java app, you have a JAR [Java archive] or WAR [web application archive] file and you give it to something like [Apache] Tomcat or JBoss to run it," Carr said. "Instead of giving it a JAR, you give it an LLM."

The acquisition means Red Hat will bring in engineers with expertise in LLM training, serving and inferencing as enterprises struggle with return on investment and data privacy concerns with generative AI. These issues have some companies exploring the idea of hosting generative AI workloads themselves rather than paying a public cloud provider to take in sensitive model training data, according to industry analysts.

"Having smaller models closer to the user and being able to manage model sprawl are tough challenges that this acquisition has helped [for] Red Hat," said Rob Strechay, an analyst at TheCube Research. "OpenShift AI is doing extremely well in enterprise organizations … still trying to get to ROI. This addition will take models into the corners of an enterprise's deployments, such as on the manufacturing floor and telco [telecommunications] colocation [facilities]."

Red Hat OpenShift Commons 2024 Michael Barrett and Derek Carr.
Michael Barrett, vice president and general manager of Red Hat hybrid platforms (left), and Derek Carr, senior distinguished engineer, discuss Kubernetes support for generative AI during a keynote at OpenShift Commons.

Developer platforms pivot into LLMOps

OpenShift AI users who presented at Commons expressed interest in vLLM and other LLMOps features. But it's still early even for companies as experienced in AI and machine learning as Mastercard.

On Tuesday, reps for the credit card issuer talked about the recently launched version 2.0 of the AI workbench platform they maintain for machine learning operations services, which is now based on OpenShift AI.

Version 2.0 offers a self-service "playground" that automates deployments of Apache Spark behind the scenes. LLMOps is still on the roadmap, said Ravishankar Rao, principal software engineer at Mastercard, in an interview with TechTarget Editorial following the presentation.

"Soon we'll have LLMOps as a service based on Nvidia NIMs [inference microservices], and we want to bring in use cases to run against company-specific data," Rao said. "We're working with OpenShift AI to evaluate vLLM."

High-performance computing (HPC) engineers from New York University said their platform is still undergoing LLMOps "growing pains" partly because of overlap with internally developed Kubernetes and cloud platforms that must be migrated into OpenShift AI.

"We're still in an early pilot phase for a few isolated things with OpenShift AI," said Carl Evans, senior HPC specialist at NYU, during a Q&A session at Commons. "But there's stuff we want to bring in house [from public cloud] … to protect student data."

OpenShift's LLMOps roadmap must fend off rivals

OpenShift's LLMOps features are also still under development. For example, when users request an instance of an LLM and it spins up in vLLM, other open source utilities within OpenShift orchestrate how that model uses underlying CPU and GPU hardware resources in Kubernetes clusters. Among these utilities are Kueue, a job queuing controller, and dynamic resource allocation (DRA) for Kubernetes. DRA, launched in 2022, was put in the spotlight at last year's KubeCon because of GPU supply and cost concerns in the community.

With DRA, OpenShift AI can define resource allocations for users with specific device descriptions -- "this specific Nvidia A100 GPU" -- rather than the previous approach based on a general pool of CPUs or GPUs. Kueue offers fine-grained controls that handle contention and prioritize allocations for those resources between multiple workloads as they consume them.

DRA and Kueue evolved separately upstream. Over the next year, Red Hat plans to improve their integration under OpenShift AI, according to Carr during a keynote presentation at the Commons event.

"DRA, right now, is still not yet generally available in Kubernetes, so you're seeing very hot-off-the-press stuff integrated in the product," Carr said. "But that's a major focus for the year ahead: to make sure that the two communities work well together."

Other new features shipped this week with OpenShift AI 2.15 complement vLLM, such as a model registry in technical preview that Red Hat donated to the Kubeflow community as a subproject. Version 2.15 also supports a vLLM runtime for the KServe project's package of Knative, Istio and Kubernetes that underpin OpenShift AI's model servers.

Soon we'll have LLMOps as a service based on NVIDIA NIM and we want to bring in use cases to run against company-specific data. We're working with OpenShift AI to evaluate vLLM.
Ravishankar Rao Principal software engineer, Mastercard

The OpenShift AI Model Registry functions similarly to a container registry as a centralized place to store and manage various predictive and generative AI models but isn't yet integrated with Red Hat's Quay and other Open Container Initiative (OCI) registries. That support is slated for next year, according to company officials in a press pre-briefing last week. However, as of this week, OpenShift AI added support for KServe's Modelcars, which streamline model fetching using OCI container images.

Still, the market for generative AI models continues to move at a dizzying pace.

"Models are not sexy anymore," Strechay said. "Agents and management of agents is where money will be."

Neural Magic lays the groundwork for agentic AI support at Red Hat. But competitors such as Nutanix are catching up in LLMOps, Strechay said.

"Nutanix also launched its equivalent to OpenShift AI today," he said. "It used to be 'GPT in a box,' now rebranded Nutanix Enterprise AI, which is deployed on Kubernetes, pieces of the D2iQ acquisition and in partnership with Nvidia."

Red Hat also doesn't build its own data lakehouse for AI data management -- an area Strechay said the company or its parent company IBM might look to shore up with further acquisitions.

"There is a lot of open source [Red Hat is] packing in, but it's not the same as using [Dell partner's] Starburst for governance on top of the open source Trino [query engine] project," Strechay said. "We know some interesting [data management] companies are running out of [funding]. … But most organizations are filling out their opinionated AI stack."

TechTarget news writer Esther Ajao contributed to this report.

Beth Pariseau, senior news writer for TechTarget Editorial, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.

Next Steps

AI gateways, Kubernetes multi-tenancy loop in LLMOps

Dig Deeper on Containers and virtualization