Definition

What is a voice user interface (VUI)?

A voice user interface (VUI) is a type of interface that relies on speech recognition technology to enable users to interact with an application or device through voice commands. The application might run locally on the device or be hosted on a web server or cloud computing platform. The device could be a computer, smartphone or other type of system.

VUIs are being integrated into a growing number of devices and applications, including virtual assistants, smart speakers, smart TVs, desktop and laptop computers, and smart home systems. The industry's three top virtual assistants -- Apple Siri, Amazon Alexa and Google Assistant -- are prime examples of applications that rely heavily on their built-in VUIs. The assistant's VUI enables users to request information and issue commands using only their voices.

VUIs take a much different approach to user interaction than traditional interfaces, such as graphical user interfaces (GUIs) or command-line interfaces, which require a mix of access devices, such as monitors, keyboards, mice, touchpads or touchscreens. This voice-first approach lets users initiate automated services and execute their day-to-day tasks in a way that is faster, more efficient and more intuitive.

List of 10 tasks for a virtual assistant.
Not only has the virtual assistant found a place for the regular user, but AI-powered assistants can make a great impact in the enterprise as well.

Applications and devices that incorporate VUIs often use traditional interfaces as well. For example, a user can interact with Apple's HomePod smart home hub speaker by saying "Hey Siri" and then speaking a command. However, the user can also touch the top of the speaker to initiate a conversation without speaking "Hey Siri," or they can touch the speaker to turn the volume up or down or to stop music from playing.

In some cases, Siri will provide additional information through the user's iPhone when responding to more complex queries, rather than trying to provide too much information as a voice response through the HomePod.

Today's VUIs enable users to perform a wide range of tasks, depending on the application or device, as well as the specific circumstances. For instance, users can perform the following tasks through VUIs:

  • Search the web.
  • Shop for products online.
  • Play music or skip tracks.
  • Search for content on their TVs.
  • Compose texts or emails.
  • Set alarms, timers or reminders.
  • Request real-time weather or traffic updates.
  • Update electronic health records.
  • Add appointments to their calendars.
  • Control car infotainment systems.

Vendors are continuously improving their VUIs, adding new capabilities and integrating them into more devices and applications. The ongoing investment in artificial intelligence (AI) technologies, particularly generative AI, promises to expand even further what users will be able to do through their VUIs and how intuitive the conversations will become.

Visual representation of smart home connectivity.
Voice user interfaces enable users to interact with smart home hubs.

The evolution of VUI

The first era of VUIs, according to the book Designing Voice User Interfaces by Cathy Pearl, was dominated by the interactive voice response (IVR) systems developed in the 1980s. These systems were capable of understanding voice inputs over the telephone and executing a given task.

However, other sources consider IVR systems to represent the second generation of VUI and point to efforts in the 1950s and 1960s as the original VUIs. An often-cited example is the development of the Audrey system by Bell Labs in 1952. Audrey could recognize the spoken digits zero through nine with up to 90% accuracy. Ten years later, IBM introduced Shoebox, which could understand 16 spoken words in English. Other efforts were also underway during this time, laying the foundation for the IVR systems and beyond.

By the early 2000s, IVRs grew commonplace in service industries, such as insurance, banking, aviation, freight and transportation. IVRs could process inbound calls and direct calls to in-house agents. They could also field customer questions through recorded messages, after extracting information from databases.

IVRs were initially developed to facilitate task automation without customers needing to speak to a live person, but today they're often used to respond initially to the caller before routing that person to a live agent.

Many voice-based interfaces have now moved into what is generally considered the third generation of VUIs. These systems incorporate automatic speech recognition, as well machine learning, natural language processing and other advanced AI technologies.

Applications such as ChatGPT and Microsoft Copilot combine both visual and voice information in what's known as a multimodal interface. Some systems offer entire ecosystems that incorporate VUI capabilities. For example, people who have set up Google Home smart devices can use Google Assistant voice commands to control many of their devices.

VUI design

VUI design comes with unique challenges not found in GUIs and other interface types. The VUI does not use a screen to display information, nor does it provide options for physically interacting with the interface. In addition, users cannot access information over time.

The transient nature of auditory interfaces therefore necessitates that the VUI clearly state possible interaction options and provide only essential information without overloading or confusing users. In addition, users must be coached in what voice commands the VUI will understand and the type of interactions they can perform.

Third-generation VUIs attempt to go beyond the typical one-turn conversation often associated with IVRs. (A turn is one interaction between the user and the system.) VUIs can also "learn" from user input and predict their future needs. Although VUI designers have yet to develop a system that can fully simulate a human conversation, rapid advancements in AI technologies are making VUIs smarter and helping to optimize the user experience.

Despite this progress, designing an effective VUI is still a complex process that requires extensive knowledge of multiple fields, including computer science, human psychology and linguistics, as well as the careful study of human cognitive abilities, conversational language and speech technology.

VUIs for business

VUIs are not limited to consumer and home applications. They're increasingly finding their way into business environments, where they promise to help increase efficiency and productivity, and lead to greater customer engagement.

VUIs can help streamline operations, simplify routine tasks, facilitate collaboration and provide more effective employee training and education. They can also make it easier for workers to access the information they need when they need it and then share it with others. In addition, organizations can use VUIs to enhance their products and services and engage with customers more effectively.

Vendors have undoubtedly come to recognize the value of VUIs for their business customers. For example, Amazon's Alexa for Business enables organizations to use Alexa in a variety of ways, from setting up Echo devices and generating Alexa usage reports to enriching customer services and building Alexa-enabled devices.

When powered by IoT and cloud technology, VUIs can be effectively integrated with third-party systems in smart homes, offices and other business environments, where it can serve a number of industries, from health care and manufacturing to retail and online sales.

Take a look at effective voice user interface design and see why AI voice technology has benefits and limitations. Also, AI and machine learning technologies are transforming businesses and society at large. Discover the top 10 AI trends that businesses should look out for as AI continues to advance.

This was last updated in August 2024

Continue Reading About What is a voice user interface (VUI)?

Dig Deeper on ERP implementation