your123 - stock.adobe.com

News

Small language models an emerging GenAI force

Enterprises are unwilling to pay for large language models to accomplish simple business tasks with generative AI. They're looking at cheaper small language models.

Antone Gonsalves

By

Antone Gonsalves, Editor at Large

Published: 15 Dec 2023

The expense of using large language models on cloud providers is driving interest in models a fraction of the size to utilize generative AI in business.

The LLM powering GenAI services on AWS, Google Cloud and Microsoft Azure are capable of many processes, ranging from writing programming code and predicting the 3D structure of proteins to answering questions on nearly every imaginable topic.

The breadth of the capabilities is awe-inspiring, but taming such massive AI models with hundreds of billions of parameters is expensive. Enterprises are asking whether training a small language model (SLM) to power, for example, a customer service chatbot is more cost-effective.

"Our favorite customer quote is that generalized intelligence might be great, but I don't need my point-of-sale system to recite French poetry," said Devvret Rishi, CEO of startup Predibase, during a presentation this week at The Linux Foundation's AI.dev Summit in San Jose, Calif. Predibase provides software tools for training SLMs.

Devvret Rishi, co-founder and CEO, Predibase

Devvret Rishi

Over the last several months, Gartner has noticed an increase in the number of enterprise clients evaluating SLMs to reduce the expense of inference -- the complex process of training a GenAI model to produce useful responses to natural language questions.

"We have started to see customers come to us and tell us that they are running these enormously powerful, large models, and the inferencing cost is just too high for trying to do something very simple," Gartner analyst Arun Chandrasekaran said.

As an alternative, enterprises are exploring models with 500 million to 20 billion parameters, Chandrasekaran said.

"That's kind of the sweet spot," he said. "Those models are starting to gain traction, primarily on the back of their price performance."

SLMs for small jobs

SLMs can't match the breadth of tasks performed by Cohere; Anthropic's Claude; and OpenAI's GPT-4 on AWS, Google Cloud and Azure. However, SLMs trained on data for specific tasks, such as content generation from a specified knowledge base, show potential as a significantly less expensive alternative.

"Small models have limited model capacity. But if we concentrate their capacity on a specific target task, the model can achieve a decent improved performance," according to a paper from researchers at the University of Edinburgh in the United Kingdom and the Allen Institute for AI in Seattle.

In January, the consultancy Sourced Group, an Amdocs company, will help a few telecoms and financial services firms take advantage of GenAI using an open source SLM, lead AI consultant Farshad Ghodsian said. Initial projects include leveraging natural language to retrieve information from private internal documents.

Ghodsian experimented with FLAN-T5, an open source natural language model developed by Google and available on Hugging Face, to learn about SLMs. Ghodsian tested FLAN-T5's 248 million-parameter version.

"When you add resource document generation, it gives you way better results than using [LLMs], and it's a lot easier to run," he said. "You can even run it on a CPU. That's a big benefit."

Ghodsian used fine-tuning with retrieval augmented generation (RAG) to attain quality responses. RAG is an open source, advanced AI technique for retrieving information from a knowledge source and incorporating it into generated text.

"You get a really good answer from [FLAN-T5]," Ghodsian said. "Really good."

The potential of SLMs has attracted mainstream enterprise vendors like Microsoft. Last month, the company's researchers introduced Phi-2, a 2.7-billion-parameter SLM that outperformed the 13-billion-parameter version of Meta's Llama 2, according to Microsoft. The company has released Phi for research only.

SLM strengths, weaknesses

Providers of open source SLMs tout access to the models' inner workings as a crucial enterprise feature.

For example, users can access the parameters, or weights, that reveal how the models forge their responses. The inaccessible weights used by proprietary models concern enterprises fearful of discriminatory biases.

Another critical concern is data governance. Many organizations are worried about data leaks when fine-tuning a cloud-based LLM with sensitive information.

Our favorite customer quote is that generalized intelligence might be great, but I don't need my point-of-sale system to recite French poetry.

Devvret RishiChief product officer, Predibase

Open source technology also has its critics. In June, supply chain security company Rezilion reported that 50 of the most popular open source GenAI projects on GitHub had an average security score of 4.6 out of 10. Weaknesses found in the technology could lead to attackers bypassing access controls and compromising sensitive information or intellectual property, Rezilion wrote in a blog post.

Promising SLMs named by Chandrasekaran included Meta's Llama 2, the Technology Innovation Institute's Falcon, and Mistral AI's Mistral 7B and Mixtral 8x7B.

Mixtral 8x7B, which is in beta, has nearly 47 billion parameters but processes input and generates output at the speed and cost of a 13-billion-parameter model, according to Mistral. The French startup raised $415 million in funding this month, valuing the company at $2 billion.

Mistral's models and Falcon are commercially available under the Apache 2.0 license. Having a for-business certification is critical, Chandrasekaran said.

"We're starting to see more and more of these open source models being certified for commercial use, which is a pretty big deal for a lot of enterprises," he said.

Open source model providers have an opportunity next year as enterprises move from the learning stage to the actual deployment of GenAI.

"They're still deciding, but they're ready to jump as soon as January hits," Ghodsian said. "They've got new budgets and want to start implementing or at least do some [proofs of concept]."

Antone Gonsalves is an editor at large for TechTarget Editorial, reporting on industry trends critical to enterprise tech buyers. He has worked in tech journalism for 25 years and is based in San Francisco.

Next Steps

Allen Institute for AI launches open multimodal models

Dig Deeper on AI infrastructure

Search Business Analytics

Yellowfin joins GenAI fray, launches new NLQ capabilities
The longtime BI vendor is the latest to add generative AI-powered natural language query capabilities aimed at enabling more ...
Tableau enters the agentic AI era with the launch of Next
The new platform features agents addressing needs such as data preparation and natural language-based analysis, and includes a ...
8 top self-service analytics tools
Self-service analytics tools, such as Power BI, Tableau and Qlik, help users analyze data independently with AI, automation and ...

Search CIO

Policymakers look to state laws for federal data privacy law
20 U.S. states have adopted comprehensive data privacy laws, meaning businesses face a complex network of laws with varied ...
The top quantum computing jobs in 2025
Quantum research is a growing field with several available career paths for tech professionals, including quantum software ...
DOJ rule aims to block adversaries' access to personal data
The new federal measure could apply to companies beyond data brokers. It stems from an executive order signed by former President...

Search Data Management

7 data cleansing best practices
Organizations rely on data for analytics and decision-making, but if that data is flawed, inconsistent or otherwise unreliable, ...
Unified databases: A powerhouse behind generative AI success
Unified databases support generative AI (GenAI) by integrating all data types into a single platform, streamlining infrastructure...
Monte Carlo launches first agents for data observability
Though the vendor already provides monitoring and troubleshooting capabilities, generative AI-powered agents for each aim to make...

Search ERP

4 common challenges of supplier relationship management
Problems with suppliers can affect a company's supply chain and, eventually, its bottom line. Learn some of the best ways to ...
ERP customers seek answers to AI questions before investing
ERP customers say yes to AI investments, as long as they can realize efficiency and productivity gains.
Infor offers process mining, automation in CloudSuite ERP
With Velocity Suite, Infor embeds process mining, automation and generative AI into its industry-specific CloudSuite ERP.

Close