With Orchestrate, IBM Envisions AI as the Next-generation Interface

AI assistants are proliferating, but how useful are they, really? For the most part, they are single-minded, task-specific helpers that live within a particular application. Visionaries in natural language AI have dreamed of a day when we interact with and accomplish complex tasks with software by simply asking the software to do something, regardless of if it requires multiple, disparate applications and processes. IBM has the ambition to deliver on that promise with their Orchestrate platform. I spoke with Parul Mishra, vice president, product management, IBM watsonx Assistants and Business Automation, about the organization’s dreams, goals, challenges, and progress on delivering this type of universal AI assistant.

At IBM Think, we had the chance to talk about Orchestrate and I mentioned that it felt like IBM might be exploring what I call the holy grail of natural language AI—which is making AI the next-generation computer interface to any and all software. Today in the marketplace there’s some confusion—people are hearing about AI assistants, chatbots and agents, but those are piecemeal interfaces. None of them really meet that vision of a single interface, where we can simply ask an AI to do something and they do it, regardless if it takes multiple applications. Is that the vision for Orchestrate?

Parul Mishra: We believe generative AI technology can help scale productivity for more than one app. Our vision is an AI assistant that’s not only able to understand a query in natural language, but also has more agentic behavior—so it can break down user intent when it includes multiple systems, to understand where to route sub intents, which may involve connecting to an enterprise application or conversing with an assistant that already exists for that application, getting that information back, then having the planning and aggregation intelligence to put together a response that holistically addresses the user intent.

Typically all of this would happen in multiple steps because there is a lot of disambiguation that occurs, especially with more sophisticated intents.

Here’s what it means—let’s say a user asks, “I’d like to see all my meetings this week and send a summary to all of my team.” That task requires an assistant to go several layers deep to understand who I am, what’s my team, where to get that information from, what does opportunity mean, what systems do I use for that, what calls have I had, or videos or email, what technology do I use to summarize. It’s this kind of assistant that is the future of generative AI assistants and our vision for where we want to go.

Where is IBM in the journey to deliver on that vision? What do you think are the kind of tasks this kind of agent can help with? What makes sense?

Mishra: In terms of tasks, we are thinking about deterministic vs. probabilistic workflows. There are situations where the business processes are somewhat rigid—certain approval chains, ways of doing things. The AI assistant needs to know that. For example, in expenses, there are guidelines that need to be followed and approval that need to be secured. We put that into a more deterministic process category.

On the other hand, let’s say I ask for something that’s stored in a knowledge base. If there is an opportunity to pull that into an LLM [large language model] and be able to converse in a more fluid, dynamic way and provide more guidance to a user, that’s more of a probabilistic process. The challenge is when you put an AI assistant in front of a user, they aren’t going to differentiate between the two process types.

At IBM, we consider ourselves customer zero, we try these concepts out on ourselves. We started with an AI assistant for HR. It started with “Ask HR”—very basic Q&A, how many vacation days do I have, what’s the holiday schedule.

But that’s a first step of value. Then we saw basic interactions start to happen—I want to update my home address, they didn’t want to go into the application to do it, they wanted the AI assistant to do it. Then we saw that users asked the assistant to do even more complex tasks—I want to transfer an employee, or I want to handle promotions. So the way we have addressed this landscape within IBM is we have different assistants for payroll, benefits, compensation, travel and expense, employee management. The user doesn’t need to know there are all of these assistants, because we are building routing for the user. If someone wants to do their expense report, they get routed to the right assistant without knowing. That brings them to the assistant and the application where they can do the deep work.

Sometimes the assistant can do all of that task for them and sometimes the assistant will guide them into the tool or communicate with other assistants to get the task completed. So that brought us to this idea that we need to build a ‘mother bot’—the Orchestrate agent or ‘conversation controller’—that would use LLM-based routing to direct each turn of a multiturn conversation to the right tool or assistant, moderating input and output according to client governance policies, executing complex multistep skill flows and automations. We are using LLM for a different purpose than many companies are. We are using it to understand the intent through formal, informal, and hybrid reasoning and route that to the right source, either an LLM or a deterministic process. The flexibility this provides to a company really addresses the challenge.

What we have right now in Orchestrate is a platform that can automate these deterministic or probabilistic processes—it gives companies the flexibility to use prebuilt assistants for a domain or add to it or build your own custom assistants— it gives them any and all possibilities in combination.

In the vision for IBM, are these assistants only for managing IBM applications or any and all applications?

Mishra: Orchestrate isn’t limited to IBM applications, it addresses the problem of connecting and integrating multiple applications—not only enterprise applications like Salesforce, ServiceNow and SAP, but also Box and Slack and email and others.

Where is IBM in terms of the timeline for delivering this type of functionality?

Mishra: At Think we released our Orchestrate AI assistant builder. It lets users design their own assistants for Q&A or leverage more than 1,000 prebuilt skills for the different kinds of automations and integrations we’ve been talking about. What we are planning to add in 2H 2024 is to advance the value from just having skills to having prebuilt skill flows—for example, within employee support, requesting time off has multiple steps, that becomes more than just one skill, it becomes a full prebuilt workflow that connects multiple systems underneath. A user would configure that in Orchestrate and launch the assistant prebuilt. The reason this is key is it takes a lot of hard work to connect these enterprise systems correctly and securely. But this is what IBM is in the business of—complex business integrations—it’s how we can really bring value.

This path to AI assistants in the GenAI era is different than the path AI assistants were taking previously. What I mean by that is, before GenAI, companies tended to outsource to a specialist vendor, like Nuance, for example. That’s not the case anymore—companies are much more likely to build their own. What are you seeing with Orchestrate?

Mishra: Companies are looking for a range—smaller ones tend to use Orchestrate prebuilt solutions with little or no customization. Bigger companies are more on the spectrum of customization, using some prebuilt skills and multiple assistants to the complex integration vision of one assistant for all, along the lines of how our IBM HR assistant has evolved. The platform is designed to address the entire spectrum of needs in delivering AI assistants.

What are the biggest learnings you have had in your work in building AI assistants?

Mishra: As I mentioned earlier, IBM is client zero for Orchestrate, so some of our biggest learnings are understanding what are repeatable processes, and what tasks and processes are most valuable to users—what is it that they ask for the most? It’s helped us to be very razor-focused on what to offer in the prebuilt skills. We are also learning a lot about the culture and adoption and behavior—for example, we learned that people don’t want a page full of assistants, they want a simplified approach.