How to make AI agents better conversationalists: Context is key
IBM's Bob Moore and Raphael Arar are committed to designing AI agents that actually understand -- and retain -- context. The aim? To make AI exchanges more like conversations.
Most current conversational AI agents are only good at short exchanges -- a user asks a question and the assistant answers it. IBM's Bob Moore and Raphael Arar are working to change that.
In part one of this three-part interview series, "Conversational UX design: What it is and who's paving the way," Moore and Arar explained the emerging field of voice interfaces and how their research -- with roots in academia -- is giving others in the field a template for designing more human-like conversations.
Here, Moore and Arar, conversation analyst and UX designer, respectively, discuss what sets their research apart and why it has broad applications in both the enterprise and consumer spaces. Hint: In a true conversational interface, context -- and retaining context -- reigns supreme.
Editor's note: This interview has been edited for clarity and length.
What differentiates the work you're doing on conversational agents at IBM from the state of the AI agents already out there?
Raphael Arar: One thing that really differentiates what we're doing versus what we tend to see out there in existence is that -- and we really want to stress this fact -- the systems that we're designing and building hold conversation.
Typically, what you see right now are these one-off questions and answers. From a design standpoint, when you design a flow like that, it's a really straightforward translation to a technologist or developer. If you ask an [AI] agent, 'What's the temperature outside?' The agent will say, 'It's 72 degrees and sunny.'
Bob Moore: Or you say, 'Play Lana Del Rey,' and [the AI agent] plays it. That's simple.
Arar: But the issue with some of the systems that we see right now is if the agent says, 'It's 72 and sunny,' and then I respond back saying, 'What do you mean?' the context isn't necessarily passed back to the system. So, it has no idea what I mean when I say, 'What do you mean?' -- it doesn't know that that's some sort of follow-up question.
Therein lies some of the challenges, because when we're interacting with developers and a back-end system, we need to know the exact places in time where a variable needs to be set or something needs to be stored in order to [keep the conversation going]. We need to keep the context going for the end user, so that the end user doesn't get frustrated that the agent doesn't actually understand.
Moore: That's really key. A big differentiator is that a lot of the [AI agents] right now just do two turns. The user does a turn and the assistant does a turn and that's it. And if you do a third turn it's going to see that as a new conversation -- as a new interaction.
What's out there is more of a voice control model, even a query model. I mean, if you take Google, you type a query into the search engine and you get a response back from the system. You never then type in 'thanks' because Google will go and then look up 'thanks.' It doesn't have a sense of a third position; or a next turn.
I would say your interface isn't conversational unless you can do more than two turns; unless you can do an arbitrary number of turns in which the conversational or sequential context is persisted across those turns and remembered [by the AI agent]. That way the system knows what we're talking about and we can do a next turn. So, when you say, 'What do you mean?' or, 'Thank you,' [the AI agent] knows what you're talking about. It preserves the context. If it can't do that, it's not really a conversation. It might be a natural language interface or voice control interface, but it's not a conversational interface.
Would you say you are designing for business use cases, consumer use cases or both? What are some of those use cases?
Arar: In conversation analysis there are models for different types of conversations. In a B2B-type setting, the type of conversation we frequently see are service encounters, in the sense of customer service.
When we think about customer service in the context of an enterprise, it's not necessarily related to direct-to-consumer, but it could be. So, we see a lot of use cases for technical support, for augmented intelligence-type scenarios in which you have a medical professional trying to get more information about a patient, and others. We often see these [AI agents] as assistants to enterprise employees.
More about Moore and Arar
Bob Moore is a research staff member at IBM Research-Almaden in San Jose, Calif. He is the lead conversation analyst on IBM's conversational UX design project. Prior to working at IBM, Moore was a researcher at Yahoo Labs and at the Xerox Palo Alto Research Center, and was a game designer at The Multiverse Network. He has a Ph.D. in sociology from Indiana University Bloomington with concentrations in ethnomethodology, conversation analysis and ethnography.
Raphael Arar is a UX designer and researcher at IBM Research-Almaden. Previously, he was the lead UX designer for the Apple and IBM partnership and lecturer at the University of Southern California's Media Arts and Practice Division. Arar holds an MFA from the California Institute of the Arts and his artwork has been shown at museums, conferences, festivals and galleries internationally. In 2017, he was recognized as one of Forbes' "30 Under 30" in enterprise technology.
Moore: We're designing for both [the business and consumers]. For example, right now, we're creating a virtual travel agent for [a popular U.S. airline]. We're helping them build their own conversational agent that will talk to their customers. Other ones we have worked on are travel apps for employees, where the employees are the users.
We have some [use cases] in which healthcare professionals are getting advice from an agent that's part of a database, and for people doing different kinds of analysis of data internally or interacting with the database [via an AI agent]. We have all kinds of different use cases.
I don't make a big distinction as a conversation designer because the patterns are all there. As we see in conversation analysis, ordinary conversation provides this basic set of patterns. In work settings, we adapt them slightly to accomplish the work at hand. There's a literature on that about certain encounters and all kinds of different things. We're in both [the consumer and enterprise] spaces.
Arar: You're designing for people at the end of the day. So, the big difference ends up being in the process of who's involved and who has a stake in the game. In consumer applications, the people who actually have the purchasing power are your end users. In an enterprise setting, the people who have the purchasing power aren't necessarily your end users. You have more needs and wants to juggle [in the enterprise], so that adds some unique challenges from a design standpoint, especially in the planning phases and trying to anticipate the types of questions that your users want answered. It's a balancing act to determine what are the business needs of the stakeholders, what are the end users' needs and how do we design an agent to handle all of the possible scenarios.
Continue onto the third and final part, "Tackling the 'ask me anything' challenge of a conversational interface," where Moore and Arar discuss the challenge of designing conversational interfaces where the potential incoming questions are limitless.