Nabugu - stock.adobe.com

A look at open source AI models

Open source AI models have advantages over generative AI services offered by major cloud providers. But enterprises have to weigh the benefits against the costs.

An alternative to the massive generative AI models developed by major cloud providers AWS, Google and Microsoft are do-it-yourself options that enterprises can tailor to their needs.

Startups such as Cerebras Systems, Databricks and MosaicML hope to convince enterprises that they can get more bang for their buck with open source AI models that enterprises control and train with their own data to provide information to customers or help employees with specific tasks.

"If you believe you can provide an advantage with your data, you want to use an open source model and train it with your specific data," said Andrew Feldman, CEO of Cerebras, which builds computer systems and models for AI applications.

Drugmakers, financial services, academic researchers and government agencies have used AI for years. Generative AI changed the game by enabling text-based, humanlike responses to natural language queries.

The advancement promises to make AI accessible to everyone. Software developers can use it to generate code, salespeople can fine-tune email pitches, and marketing teams can craft better product descriptions. Also, employees can get answers to questions and summaries of documents and meetings.

Generative AI's potential for driving efficiency in business processes is why enterprises are intensely interested in the early-stage technology.

Cost of generative AI models

The cost of an in-house model will depend on the number of parameters used to train it and how quickly the system responds based on the number of users simultaneously submitting queries.

On-premises models at the low end will cost a few $100,000, so it's often cheaper for organizations that use them only occasionally to run them in the cloud, said Dylan Patel, an analyst at SemiAnalysis. However, companies that use models regularly could cut costs with on-premises systems while gaining more flexibility in deployment and customization.

IT services provider World Wide Technology (WWT) tested the performance of six large language models, including OpenAI's GPT-2 and BigScience Workshop's Bloom.

WWT trained the generative AI models on 100 million to 3 billion parameters, which the company found could fit into the memory of typical on-premises hardware, excluding the RAM used by the GPU. By comparison, OpenAI used 175 billion parameters to train GPT-3.

WWT concluded that enterprises training models with 100 million to 6 billion parameters would also have to fine-tune them for specific tasks, such as text summarization, question answering and text classification, to get accurate responses and meet user expectations.

"The experiments and the results provide a rough guide of the kind of capabilities one can expect from open source models of small sizes," said Aditya Prabhakaron, WWT's data science lead.

MosaicML's open source MPT-30B LLM, which is available for commercial use, would take 11 days to train for about $700,000, said Joel Minnick, marketing vice president at cloud data platform provider Databricks, which plans to acquire MosaicML for $1.3 billion this month.

"That price point and time put the technology in the hands of any organization," Minnick said.

A recent breakthrough in open source tools for LLMs could reduce the cost of model training. Low-rank adaptation (LoRA) reduces the computer power and memory needed to train LLMs. Developed by Microsoft researchers, LoRA runs on hardware with as little as 11 GB of GPU RAM.

LoRA could also reduce the cost of creating a specialized model, Patel said. A mainstream enterprise licensing a model would use LoRA to tailor it to its data set or pay a professional services company.

Startup Personal.ai on MosaicML cloud

Startups are major users of specialized models. Personal.ai, for example, builds tiny generative AI models for each subscriber to its service. The company trains each model on an individual's data drawn from messaging apps, Google Drive documents and web applications.

"We specialize in building on top of the domain of one person to answer very specific, hyperpersonal requests," Personal.ai CTO Sharon Zhang said. "Because we're very tiny, we can train [models] all the time."

Multiple models share an Nvidia A100 GPU running on MosaicML's cloud. The MosaicML platform is on Oracle Cloud.

Personal.ai started on AWS, but found that the time from training to the correct response was 10 to 15 hours too long to meet user demand for near-immediate data updates. MosaicML's hardware accelerator cut the time to five to 10 minutes, Zhang said.

Also, AWS charged $5 per hour for training versus MosaicML's $2.50 per hour, Zhang said. The lower price and the shorter training cycle meant a 95% cost reduction per model.

Personal.ai is testing MosaicML's foundation models for a possible transition in the future. However, the switch will depend on the accuracy of the model's responses, Zhang said.

The future of custom models

Alison Smith, who leads Booz Allen Hamilton's generative AI team, said she expects enterprises to use small, customized models eventually. The firm's clients have expressed interest in using open source models of 6 billion to 10 billion parameters, which a defense contractor, for example, could use to summarize government documents.

It is without a doubt the early days in figuring out where this market is going to go.
Joel MinnickMarketing vice president, Databricks

"We're seeing clients that want to explore and evaluate that end of the spectrum," Smith said.

Meanwhile, many enterprises will likely use less sensitive data to experiment with generative AI services through a SaaS vendor first.

"The amount of investments that you have to do to connect to a [SaaS] API endpoint is much lower than standing up a whole team and infrastructure," Smith said.

Databricks' Minnick agreed, while arguing that "the real value creation, and the differentiation that customers are going to drive in this space, is going to be from those models that they tune and host themselves."

Nevertheless, he acknowledged the difficulty of predicting the market's direction today.

"It is without a doubt the early days in figuring out where this market is going to go," Minnick said.

Antone Gonsalves is networking news director for TechTarget Editorial. He has deep and wide experience in tech journalism. Since the mid-1990s, he has worked for UBM's InformationWeek, TechWeb and Computer Reseller News. He has also written for Ziff Davis' PC Week, IDG's CSOonline and IBTMedia's CruxialCIO, and rounded all of that out by covering startups for Bloomberg News. He started his journalism career at United Press International, working as a reporter and editor in California, Texas, Kansas and Florida. Have a news tip? Please drop him an email.

Dig Deeper on Enterprise applications of AI