Amazon Q, Bedrock updates make case for cloud in agentic AI
Amazon and its partners rev their engines in anticipation of agentic AI with updates that challenge the cost and quality claims of self-hosted infrastructure competitors.
As enterprises prepare for the next phase of generative AI, AWS issued updates to its cloud services this week, countering claims from some corners of the market that self-hosted models are the way of the future.
New features for the Amazon Bedrock model hosting service and the Amazon Q AI assistant addressed some concerns cited by enterprises looking to move GenAI workloads such as model inferencing back in-house -- namely infrastructure costs and data quality issues. Vendors such as Red Hat, VMware and others within the Cloud Native Computing Foundation have been banking on this trend to favor private and hybrid cloud products, but this week, AWS struck back at that notion.
Among the new features rolled out during the cloud provider's annual re:Invent conference was Amazon Bedrock Model Distillation, which uses a foundational large language model (LLM) to train a smaller, faster, more cost-efficient model.
AWS CEO Matt Garman emphasized the cost-efficiency aspect of the update during a keynote presentation about the new feature. Distilled models can run 500% faster and 75% cheaper than LLMs, he said.
"This difference in cost actually has the potential to completely turn around the ROI, as you're thinking about if a generative AI application works for you or not," Garman said. "It changes it from being too expensive to ... roll it out in production to actually flipping to be really valuable for you."
Torsten VolkAnalyst, Enterprise Strategy Group
As a managed service, Amazon Bedrock eliminates the toil and data science expertise required for users to perform model distillation themselves on internal platforms, Garman said.
Consumption-based cloud pricing might still be a more expensive option for some enterprises, but model distillation at least effectively parries competitors' claims that self-hosted AI is the only cost-effective approach, said Torsten Volk, an analyst at Informa TechTarget's Enterprise Strategy Group.
"The cost argument of saving up to 75% is a strong one, as the proponents of self-hosted infrastructure like to compare the cost of Bedrock with that of more flexibly allocated GPUs on customer-owned systems," Volk said. "Model Distillation takes away part of this argument."
Red Hat has made similar cost-efficiency claims about its quantized models used in RHEL AI and InstructLab, but model distillation tackles costs at the training rather than fine-tuning level, said Andy Thurai, an analyst at Constellation Research.
"InstructLab is better for fine-tuning," he said. "Amazon's Distillation is better for training a student model with enterprise data."
Guardrails and safeguards
Other commonly cited concerns for enterprises adopting GenAI are about the quality and safety of results, as well as control and governance over data sources. Updated Amazon Bedrock features previewed this week also aimed to address these issues.
For example, Automated Reasoning checks, previewed for the Amazon Bedrock Guardrails policy service, mathematically evaluate the factual accuracy of LLM responses. The Amazon Bedrock Knowledge Bases retrieval-augmented generation service added a feature in preview that uses LLMs to evaluate the results of RAG applications. Amazon Bedrock Model Evaluation added a preview feature called LLM-as-a-judge that uses one model to automatically assess the responses of another according to criteria such as helpfulness, harmfulness and correctness.
Some industry analysts have previously urged caution in using LLMs to judge the output of LLMs, and specialist competitors questioned AWS' methods with LLM-as-a-judge.
"[It's] definitely a step in the right direction, but the question still remains: How do we know the LLM in question is right in the first place?" said Victor Botev, CTO and co-founder of Iris.ai, in a statement emailed to Informa TechTarget Editorial via a spokesperson this week. Iris.ai markets its own API-based AI services for developers, including RAG as a service.
"While useful for many, metrics such as 'professional style' and 'helpfulness' are still highly subjective areas that can be open to interpretation," wrote Botev, whose LinkedIn profile lists previous experience as a university research engineer specializing in neural networks. "If we want to best understand the accuracy of the model, we need to incorporate more sophisticated accuracy metrics that take into account the model's contextual understanding of the concrete domain and use case."
Using models from within the same LLM family to evaluate outputs -- as the AWS blog did with an example that used Anthropic's Claude 3.5 Sonnet to evaluate outputs from Claude 3 Haiku -- can be risky, according to Thurai. But it's early yet for Bedrock Guardrails and services like it, he added, and he expects them to become more effective with time.
"Most production systems use LLM answers directly without doing these checks, or doing very limited manual review of the answers," Thurai said. "Stopping AI hallucinations is a problem that almost every AI provider is trying to solve using various methods. Every one of these methods is [making] that course of action slightly better."
These updates at least warrant further exploration by enterprise organizations, Volk said, including comparisons with self-hosted approaches.
"Adding explainability and response validation was another critical gap for AWS to fill in when positioning Bedrock against self-hosted AI," he said. "It would take a careful analysis to compare the new AWS model explainability, auditability and reasonability checks to the other guys."
Gearing up for agentic AI
Competition among vendors to capture AI workloads has intensified as single-stream GenAI workloads evolve into agentic AI, in which groups of software entities automatically coordinate to take action on a multistep workflow. Features that improve the accuracy of GenAI apps will be crucial to this more complex orchestration; other IT vendors have also begun to support agentic AI, including Microsoft, Google and Atlassian.
Amazon Bedrock Agents, previewed this week, adds a supervisor agent developers can use to coordinate multi-agent collaboration in agentic AI workflows. Simpler agentic workflows were possible in Amazon Bedrock apps before, but Bedrock Agents will support higher-scale, more complex workflows involving hundreds of agents, according to Garman's keynote presentation.
"[The] supervisor agent ... acts as the brain for your complex work," he said. "It configures which agents have access to confidential information. It can determine if tasks need to be fired off sequentially or if they can be done in parallel. If multiple agents come back with information, it can actually break ties between [them]."
GenAI development is a rapidly evolving race, but one industry observer said that for now, the new Amazon Bedrock features differentiate AWS from both self-hosted and cloud competitors, especially since Bedrock Agents and multi-agent collaboration can be created using natural language.
"Bedrock pushes the ability to generate agents further up the stack to business users," said Keith Townsend, president at The CTO Advisor, a Futurum Group company. "The new built-in logic checking reduces the potential of hallucination and can potentially increase accuracy without engaging developers."
Amazon Q Developer partners tease agentic tie-ins
For coders, Amazon Q Developer, an AI assistant based on Bedrock, reached general availability in April. This week, the service added features for app developers such as unit test generation and enhanced codebase documentation, as well as features for DevOps pros such as issue investigation and remediation.
Amazon Q Developer transformation services targeted IT ops users with automated application transformation and modernization tools that use AI agents "to automate the heavy lifting involved in upgrading and modernizing, such as autonomously analyzing source code, generating new code, testing it, and executing the change once approved by the customer," according to a press release.
AWS partners also rolled out early examples of agentic AI workflows built on Amazon Q this week, such as GitLab Duo with Amazon Q, which uses AI agents to automate DevSecOps workflows. Users will choose from an initial set of four "quick actions" that include generating code from requirements, creating unit tests, conducting code reviews and upgrading Java applications.
"Containing agent scope within a platform like GitLab is actually a pretty good initial use case for experimenting with agentic AI, in lieu of an agent that works across the enterprise," said Katie Norton, an analyst at IDC. "A unified platform on a singular data model like GitLab can make it easier for an agent to determine the next best action because of the depth of context that it has."
Similarly, PagerDuty, an incident response vendor, demonstrated incident management integrations between its PagerDuty Advance product, Amazon Bedrock and Amazon Q from the keynote stage. Elsewhere in recent weeks, Salesforce launched Agentforce, and Microsoft replaced its Azure AI Studio with Azure AI Foundry, a platform for developers to work with AI apps and agents. Google got in on the agent trend early with its Vertex AI Agent Builder launch in April.
As agentic AI expands, AWS and its competitors must continue to provide customers with security assurances and configurable guardrails around the technology, Norton said. But, she added, the potential for this new wave of AI innovation is high.
"We've seen AI agents go from a promising concept to somewhat of a reality pretty quickly in late 2024," Norton said. "Even more so than generative AI, agents can actually deliver on 'eliminating toil' that we've been talking about for years."
Beth Pariseau, senior news writer for TechTarget Editorial, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.