Getty Images/iStockphoto
Snowflake takes aim at lowering GenAI development costs
By integrating its recently developed SwiftKV capabilities with LLMs, the vendor aims to make models more efficient so that customers pay less as they develop generative AI tools.
Snowflake on Thursday introduced Meta Llama large language models optimized with SwiftKV to improve model performance and lower the cost of developing generative AI tools.
Snowflake first unveiled SwiftKV in December and has made it open source. It’s a tool that reduces the compute power required to run the large language models (LLMs) that fuel generative AI applications.. Now, the vendor is unveiling Snowflake-Llama models, which are Llama models integrated with Snowflake's SwiftKV.
The result is improved LLM inference throughput -- the amount of text an LLM can produce in a specific period of time -- by up to 50%, according to Snowflake. More inference throughput, meanwhile, leads to cost savings by reducing the amount of power required to develop and maintain generative AI applications.
As a result -- if customers can achieve similar improved throughput -- the launch of the integrated LLMs is significant, according to Andy Thurai, an analyst at Constellation Research.
"The performance numbers look good," he said.
SwiftKV specifically works in the input stage of development to generate key-value cache generation, Thurai continued. It's a limited application, but also the most common.
"Unless enterprises have figured out other ways to have efficient prompting, key-value cache compression or prompt caching, this will work very well," Thurai said.
Llama 3.3 70B and Llama 3.1 405B models integrated with SwiftKV – named Snowflake-LLama-3.3-70B and Snowflake-Llama-3.1-405B – are now available for use in Cortex AI, Snowflake’s AI development environment.
Based in Bozeman, Mont., but with no central headquarters, Snowflake is a data platform vendor whose primary competitors include Databricks and tech giants such as AWS, Google Cloud and Microsoft. Like its peers, Snowflake has expanded beyond data management over the past two years by adding features that enable customers to develop AI tools, including generative AI.
New capabilities
Generative AI has the potential to be transformative for businesses, making workers both smarter as well as more efficient. Generative AI development, however, is expensive.
While some organizations have the finances to invest in AI development, others are held back by the financial commitment. In particular, the compute power required to run LLMs generates huge expenses.
"Cost is indeed a concern in generative AI development," said Donald Farmer, founder and principal of TreeHive Strategy. "For some companies, it's a barrier."
By improving LLM inference throughput by up to 50%, inference costs for Snowflake-Llama-3.3-70B and the Snowflake-Llama-3.1-405B models can be reduced by up to 75% compared to using Llama models without SwiftKV, according to Snowflake.
However, customers often struggle to match vendor-reported benchmark numbers, Farmer noted.
In addition, cost control was a problem for many Snowflake customers even before the recent surge in AI development, he continued, noting that the cost of using the vendor's platform has been difficult to predict for some and led to surprising expenses.
As a result, any steps Snowflake can take to reduce expenses, including developing SwiftKV, are significant, according to Farmer.
"[SwiftKV is] a big deal if the claimed cost efficiencies can be realized consistently and predictably," he said. "But if customers struggle to achieve similar savings, or have little insight into the cost controls, then there will be very little impact."
While potentially significant for Snowflake customers, tools such as SwiftKV that aim to reduce generative AI-related costs are not unique, according to Thurai.
"Most vendors have similar cost optimization and efficiency models," he said.
Some similarly address key-value cache generation to improve throughput efficiency, while others use low-cost hosting, efficient prompting, cache compression, prompt caching or model distillation techniques.
"[Other vendors] may not have the same option as Snowflake, but there are other ways to reduce the inferencing costs," Thurai said.
Farmer likewise noted that other vendors also provide tools aimed at lowering the cost of AI development. For example, OpenAI provides model compression capabilities, QServe and AdpQ use model quantization and LLM operations specialists such as Neptune.ai, WandB provide other techniques.
However, SwiftKV may be more appropriate for enterprise use than other cost-control measures, according to Farmer.
"SwiftKV’s emphasis on enterprise workloads is pragmatic and may be more commercially focused," he said.
Meanwhile, it remains to be seen whether SwiftKV will help Snowflake better compete in the AI development market after it was slower than Databricks and the tech giants to create an environment for customers to create AI applications, Farmer continued.
"The competition is hot, and they are still somewhat behind Databricks, Google and Microsoft," he said. "I am still seeing Snowflake customers adopting Databricks for AI. I am not sure if SwiftKV will change that unless it proves to be highly effective."
Given widespread concern over the cost of generative AI development, Snowflake's impetus for developing SwiftKV came from Snowflake's ongoing attempts to find ways to reduce AI adoption costs, according to Samyam Rajbhandari, principal architect at Snowflake and its AI research team lead.
Last year, many enterprises tested and deployed AI applications, he noted. This year, Snowflake expects greater production with executives under pressure to show returns on investment.
"Snowflake is consistently exploring and evaluating new methods to improve throughput, latency and cost efficiency for LLM workloads," Rajbhandari said. "Developing SwiftKV and applying it to … Meta's Llama was a natural evolution of this work."
Plans
Looking ahead, Rajbhandari said that Snowflake is considering integrating SwiftKV with other LLMs.
In addition, Snowflake's roadmap includes adding capabilities that make it easier to develop AI agents and applications, use both structured and unstructured data to inform AI tools and fine-tune models to meet specific business needs, he added.
Thurai, meanwhile, suggested that Snowflake develop easy-to-use low-code/no-code toolkits for developing generative AI applications, add more integrations with LLMs, improve support for unstructured data and improve connectivity with all major cloud platforms so that the vendor can better appeal to customers aiming to develop AI applications.
"Since [Sridhar Ramaswamy] took over as CEO [in February 2024], Snowflake has focused more on generative AI applications and has come a long way. But they are still far behind their competitors, notably Databricks."
Eric Avidon is a senior news writer for Informa TechTarget and a journalist with more than 25 years of experience. He covers analytics and data management.