AWS customers grapple with generative AI shortfalls
Enterprises fear failing to accommodate technical immaturity, ensure data security and governance, prevent biases built into models, and correct the erroneous responses of LLMs.
LAS VEGAS -- AWS' beginning-to-end approach to helping enterprises and SaaS providers harness generative AI lacks the one thing customers need most: confidence in the core technology.
In interviews and panel discussions, customers at the AWS re:Invent conference last week discussed the difficulties of incorporating GenAI in applications and workflows. Adding to the hardship were concerns over the safety of the large language models underpinning the AI capable of mimicking human-generated content.
None of the AWS subscribers or partners expressed discontent with the cloud provider's three-layer AI strategy encompassing infrastructure, platform and tools, and applications. Many found its Bedrock platform a significant advancement for choosing, training and fine-tuning an LLM.
Nevertheless, enterprises fear failing to accommodate technical immaturity, ensure data security and governance, prevent biases built into models, and correct the erroneous responses of LLMs.
Reducing the number of erroneous responses, which technologists call hallucinations, is a crucial roadblock to using GenAI.
"Hallucination is a really big deal right now in the field," said Nhung Ho, vice president of AI at financial software maker Intuit, in an interview. "We're all trying to make sure that we reduce it."
Battling model misinformation, speed
Since 2013, Intuit has relied on AWS to run its software for consumers, accountants, bookkeepers and tax advisers. In June, the company expanded its financial technology platform to include generative AI. The proprietary system lets Intuit's technologists custom-train LLMs, including OpenAI's LLM and Meta's Llama 2, to provide services across product lines.
Intuit uses multiple LLMs because they differ in response speed, cost and accuracy, Ho said. But there are no models that Intuit can trust for calculating taxes -- a process it can't get wrong. For that, it uses a deterministic AI system that always provides the same response to a specific question.
Besides inaccuracies, LLMs are not fast enough for the 99% of transactions that Intuit's platform has to complete in less than two seconds. "We're nowhere near that with large language models," Ho said.
Speeding up LLM response time is critical for Intuit to meet the goals for Intuit Assist, the GenAI-powered financial assistant introduced in September for small businesses and consumers.
"We want Assist to grow to encompass and change the way you work with our products," Ho said. "To do that, it means we need to compress the latency that we're seeing from these models."
Airtable offers a cloud-based platform for creating and sharing relational databases across an organization. Its software, hosted on AWS since 2012, is beneficial for product development and tracking large-scale marketing campaigns.
Andrew OfstadCo-founder and chief product officer, Airtable
Airtable is at the testing stage with generative AI. Its private beta tools add GenAI services to the small data sets customers create in their databases, said Andrew Ofstad, co-founder and chief product officer at Airtable. Services could include, for example, requesting keywords for a marketing brief or a list of product requirements.
Currently, Airtable primarily uses Anthropic's Claude. Ofstad acknowledged that the immaturity of the model requires workarounds for technical issues related to integrating a SaaS platform.
"We're currently building a lot of workarounds," Ofstad said in an interview. "Hopefully, the models get better and better, so you don't have to do as much of that."
The BMW Group has customized the speech-recognition technology Amazon uses for Alexa for the automaker's Mini brand. BMW developed a tiny language model that runs on the car's CPU and memory to ensure its voice system works for critical features such as giving directions without internet access.
Running generative AI on that hybrid system isn't possible because it requires more processing power than what's available in the Mini, said Patrick Prang, a voice assistant expert at BMW.
"For the automotive industry, [GenAI providers] need to overcome some challenges -- hallucinations, for example," Prang said in an interview. "We are focusing on the hybrid approach -- very important. When you start having a conversation in the cloud, and you suddenly have a lack of connectivity, the onboard model needs to take over."
Television provider Dish Network is working on proofs of concept with open source LLMs such as the Linux Foundation's Falcon and Llama to see what information the company can extract from purchase orders, Dish CIO Atilla Tinic said. The company is anxious to establish GenAI's value and exit testing mode.
"We don't want lots of science projects," he said during a forum at re:Invent.
A critical concern for Dish and many other companies is the risk of inadvertently publicizing intellectual property or sensitive data. Northrop Grumman, Samsung and Verizon have banned employees from using GenAI tools out of concerns over data leaks. Samsung imposed the ban this year after mistakenly releasing sensitive information.
"You don't want to re-create that," Tinic said. "You want to learn what others have done on that front."
Along with data security, Dish wants transparency from LLM developers to avoid biases that negatively influence hiring. Studies have shown that LLMs trained on an enormous amount of uncurated internet data can inherit stereotypes, exclusionary language and other denigrating behaviors that disproportionately affect minority groups.
"We don't want to introduce unintended bias, so it's going to be critical for us to understand the data sets that we are training these models with, as well as testing and inspecting the outcomes and the outputs coming from generative AI," Tinic said.
Healthcare executives' most critical concern is the misinformation from LLMs that could harm patients.
Michael Rivers, vice president of digital pathology in the diagnostics division of Roche, a multinational healthcare company, acknowledged that the medical industry is just beginning to explore GenAI and there need to be guardrails to prevent errors with tragic consequences.
"Right now, all of the tools that we're developing and that we're working with are supports for the pathologist," Rivers said in a roundtable discussion. "The pathologist still controls the final answer, and that's critically important in the diagnostic process."
Generative AI optimism
Despite the many technical challenges, organizations are optimistic that GenAI developers will work out the kinks.
Woodside Energy, a global oil and gas company based in Australia, collaborates with specialized AWS teams on improving supply chain operations, Woodside Vice President Tracey Simpson said. She's hopeful that GenAI will eventually provide value from data across the company's supply chain.
"What I'm excited about with generative AI is the ability for us to start to synthesize and harmonize that data in a way that we can start to generate insights and actions," Simpson said.
For Dish, the most immediate potential benefit is in customer service. Today, chatbots or interactive voice response systems handle roughly 70% of its customer calls, with 30% handed off to service reps. Tinic estimates that generative AI could cut the number of handoffs to as little as 10% by providing a more humanlike interaction with callers and delivering information based on their records.
Tinic also wants to customize GenAI for core business and IT operations. An example of the latter is gleaning more information from trouble tickets that Dish manages through ServiceNow.
"It would be fantastic if we could also use that to isolate the issue, resolve the issue and draft a fantastic root cause analysis out of it," Tinic said.
Using generative AI today
During a panel discussion, medical industry executives gave examples of using GenAI today.
Genomics England, a company owned by the U.K. Department of Health and Social Care, uses AWS-hosted LLMs to extract critical information from pathology reports and standardize the data to link it to imaging.
"The LLM we've set and trained is providing insightful insights that I don't think we would have seen before any other way," said Prabhu Arumugam, director of clinical data and imaging at Genomics England.
AlayaCare, a Canadian home care technology provider, accesses LLMs through AWS Bedrock to summarize weeks' worth of caregivers' notes on patients to extract critical information for others visiting a person's home. The visitors can receive bullet points on their mobile phones 15 minutes before entering the house, said Jean-Francois Gailleur, senior vice president of engineering at AlayaCare.
Hurone AI is a provider of GenAI applications helping oncologists provide care to patients and pharmaceutical companies with clinical trials in Africa and Latin America. The company analyzes demographic data, clinical data and patient records to generate treatment plans that doctors then review, saving them time, said Kingsley Ndoh, founder and CEO of Hurone AI.
"It's going to be a game changer as we skill [up]," he said.
Despite the optimism and successes, challenges remain in making GenAI a critical component within organizations. And while enterprise users believe it'll get there, they also agree it won't be easy.
Antone Gonsalves is an editor at large for TechTarget Editorial, reporting on industry trends critical to enterprise tech buyers. He has worked in tech journalism for 25 years and is based in San Francisco. Have a news tip? Please drop him an email.