Prompt engineering takes shape for devs as agentic AI dawns
Real-world best practices are emerging at companies such as LinkedIn, Oracle and AWS, and agentic AI is poised to fundamentally alter the day-to-day work of software engineers.
Prompt engineering has been a buzzword since generative AI first emerged, but it has taken on new significance as the next phase of AI emphasizes its effective use.
Prompt engineering refers to a set of techniques that optimize results generated by the large language models that underpin generative AI tools. LLMs such as ChatGPT were the first wave of generative AI (GenAI) tech to reach the enterprise IT market in late 2022. Over the last 18 months, agentic AI, in which autonomous software components backed by LLMs independently execute multistep workflows, has emerged as the next evolution of GenAI.
While prompt engineering can help make GenAI responses more usable and accurate, it becomes critical when LLMs begin to act -- including writing and deploying whole applications.
Agentic AI puts a fresh layer of abstraction between human developers and code. Although some coding might still be involved, it combines other disciplines, including logic, art and natural language. This opens app development to a broader pool of people while giving rise to a new AI engineering specialization to undertake the most complex and nuanced development of AI-driven systems.
What it means for a software developer in the future is that they must be able to seamlessly straddle these worlds -- you need to be good at leveraging AI, but at the same time, capable of going deeper.
Krishnan Sridhar Vice president of engineering, developer experience and platforms, LinkedIn
According to experienced practitioners, however, most mainstream software developers will need to balance coding skills and prompt engineering prowess to remain relevant in the world of AI agents.
"If you're talking simple greenfield apps, you can definitely have an agent build them out, but for large enterprise-level software systems, I still think you will need a human setting up the core aspects of it," said Krishnan Sridhar, vice president of engineering, developer experience and platforms at business social network LinkedIn. "What it means for a software developer in the future is that they must be able to seamlessly straddle these worlds -- you need to be good at leveraging AI, but at the same time, capable of going deeper."
Building an AI automation foundation
Agentic AI represents a step forward in IT automation, in which systems determine their own workflows, but an old computer science axiom still applies: Garbage in, garbage out.
Krishnan Sridhar
"What we are finding with some of our early efforts here is automation works best when your system is more streamlined and predictable to work with," Sridhar said. "At the end of the day, all the LLMs are trained in natural language. They try to speak code like it's English. So the more you can be descriptive in your system architecture, in the labeling [of resources], it makes a difference in how much an agent is able to extrapolate and make informed decisions."
Once LinkedIn laid the groundwork, developers began to get a feel for prompt engineering by working with single LLMs, said Karthik Ramgopal, a distinguished engineer at LinkedIn. From these early experiences, best practices have begun to emerge.
"One of the things which is quickly emerging is [that] there is a lot of literature right now on the cognitive limits of models," Ramgopal said. "There is a lot of guidance on how you do task breakdowns in order to not give too much information to a model, break things down into smaller components and chain the prompts together to get better quality. This also improves the testability drastically, because you can then start testing each of these individual pieces, as opposed to testing a complex aggregate workflow."
Karthik Ramgopal
Newer LLMs can generate structured outputs formatted as downstream applications expect, such as JSON or XML. Ramgopal said using these structured outputs is also emerging as a best practice.
"[This is] as opposed to earlier techniques, which involved multiple cycles, or critique loops, or careful parsing of the output," he said. "Now, all those things can be avoided."
Ramgopal said developers should become well versed in techniques that tweak how data is fed to LLMs to optimize results, such as retrieval-augmented generation and few-shot learning. RAG supplies contextual data to the LLM, while few-shot learning provides representative examples of a desired response for a model to learn from.
"All of these decisions are still being made by developers: 'What technique do I use? Do I use a chain of thought, or tree of thoughts, for example, to reduce hallucinations at the expense of latency? Do I use few-shot learning? How do I generate examples for my few-shot learning? Is it hardcoded, or is it dynamically fetched from a vector database?'" Ramgopal said.
Knowing the right AI tool for the job
Human discernment and developer expertise are still crucial to orchestrating AI systems and assessing their outputs, according to Antje Barth, principal developer advocate for generative AI at AWS.
Antje Barth
"Generative agentic AI is not always deterministic -- it might come up with a couple of different suggestions," Barth said. "This is still where experience comes in as a software developer, to say suggestion two might be a better path, or maybe the first suggestion didn't work out because of other constraints."
The most advanced generative and agentic AI tools come at a significant cost in compute resources, energy and cloud service consumption, which means developers must be savvy about when to use the most advanced AI tools and when conventional problem-solving will do the job, said Sudha Raghavan, senior vice president of developer platform at Oracle.
Sudha Raghavan
"You do not need a thousand-node GPU cluster to solve a small chatbot inferencing problem," for example, Raghavan said. "The thought process of a developer today [of] 'I have a problem, I do a design, I write some code' has changed to 'I have a problem. What's the fastest way to design, and is that design not just optimal for performance, but optimal for cost? Can I get that [cost information] upfront? What level of proof of concept do I need to do before I say this can go into a large-scale, distributed system and service?'"
According to Ramgopal, proper prompt engineering is also crucial in controlling resource costs. For example, developers must craft prompts correctly to take advantage of prefix caching, a feature that stores and reuses previous inference requests to improve the performance and efficiency of LLMs.
"If you're able to put the fixed or the shared parts of a prompt at the top, as opposed to somewhere in the middle, you can take advantage of prefix caching, and you can get a lot of benefits in terms of cost and latency when working with LLMs," he said. "[There are also] techniques for compact encoding of information, especially when you're doing retrieval-augmented generation, and also some techniques for getting output in a compact way, because a bunch of [compute] time is also spent in encoding and decoding of tokens."
Working at a higher level of logic
Once developers have a strong foundation in prompt engineering for single LLMs and a feel for how different AI tools work, agentic AI draws on that experience, but requires another shift in approach.
According to AWS' Barth, developers can orchestrate AI agents using the same specificity and logical task breakdowns they'd use in individual LLM prompts.
"Amazon Q Developer has different components," she said. "There's a component that helps with unit testing. There's a component that helps with the planning phase, [and then you] have the inline code creation. So this is [about] bringing these focused agentic tasks into a more powerful overall tool."
Ultimately, as early LLMs give way to models more capable of reasoning and agentic AI systems take over more decision-making tasks, this skill set will be less necessary to take advantage of AI agents. In the long run, developers must shift their technical expertise to a broader view of overall system architecture.
"By elevating the LLM up to a higher level in the stack, orchestration logic code traditionally written by humans is now replaced by a set of task descriptions and tools," said Ian Beaver, chief data scientist at Verint, a contact center-as-a-service provider in Melville, N.Y. "Prompt engineering as a skill will only continue growing since it is how tasks and resources are described to build an agentic system."
In the long term, developers will have what one industry analyst calls digital workers available on demand.
"Agentic AI is about augmenting humans with agents, such as developers with data science skills or support personnel with needed SRE [site reliability engineering] help," said Andy Thurai, an analyst at Constellation Research. "A developer can outsource that work to an AI agent and continue their app dev work."
At that point, software development will place more emphasis on understanding how to deliver agentic AI capabilities, such as highly personalized customer experiences, safely and credibly to customers, Barth said.
"Making sure [agentic AI's] answers are based on company-relevant data, for example, including the safeguards needed in that application," she said. "This is the skill set, on a more architectural level, that will need to happen."
Providing safe prompt engineering playgrounds
There's evidence that digital workers are already replacing their human counterparts at some large companies, as agentic AI systems open application development to a wider audience of professionals without code expertise. But Beaver said developers' expertise will still be needed to ensure the safe use of these systems.
"While agentic build tools and UIs have lowered the barrier to entry for GenAI, it's not a good idea to allow people untrained in software delivery and quality practices to use them," he said. "The standard quality assurance process still needs to be applied and, in some ways, needs to be more rigorous with agentic AI because of the broader spectrum of system behaviors it can produce."
A user's choice of underlying LLMs can result in very different behaviors in a downstream application, and nontechnical agentic AI developers might not have the background to know this or know what to do about it, Beaver said. They might not conduct the testing necessary to prove that the agentic system behaves predictably when deployed in production and might be unaware of the breadth of actions provided by an API they give to the LLM to use.
"If that API exposes CRUD [create, read, update and delete] operations, the LLM may unexpectedly alter business data," he said.
The unpredictable nature of agentic AI led LinkedIn engineers to create prompt engineering playgrounds that run in Jupyter notebooks for developers, data scientists and product managers to experiment safely.
"With agentic AI products, you're only defining the guardrails [for AI automation]," Ramgopal said. "The product manager has to come up with those quality guidelines, which we call the evaluation framework. This playground environment is an easy way to play with the product experience and see what happens without having a full-fledged development environment set up on their machines."
Beth Pariseau, senior news writer for Informa TechTarget, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.