Brush up on your knowledge of speech technology terms
Vocal commands are one way speech technology is making its way into business applications and services. This mini glossary will refresh your knowledge of speech technology and some of its use cases.
If you've ever asked Amazon Alexa for a weather report or told your Google Home to set a timer, then you've used speech technology. Speech technology is making its way into the enterprise, from contact centers to personal digital assistants. As the technology matures and vendors try to differentiate themselves, new use cases arise.
Speech technology is a catch-all term for any technology that interacts with audible speech. Most speech technology-enabled tools and applications use a combination of the different types of available technologies. This mini glossary will help you navigate speech technology concepts and their enterprise use cases.
Under the hood of speech technology
AI. AI-enabled machines learn, reason and self-correct with the intention to simulate human intelligence. For speech technology, AI adds a layer of automation to call centers by recognizing pertinent information from a call and choosing the best place to route the call. AI-driven speech recognition can also identify patterns, allowing applications to make suggestions based on what is said during a call, using speech recognition, machine learning and in some cases natural language generation or sentiment analysis.
Machine learning. Algorithms that allow machines to become more accurate and predictive without explicit programing are using machine learning. For speech technology, machine learning can be used with language analysis in contact centers to predict the outcome of a call by recognizing verbal patterns and reacting accordingly. Each interaction machine learning-enabled tools have improves an application's accuracy. Machine learning in speech technology relies primarily on deep learning models to sort through the many variables of spoken language and then process the data to improve future performance.
Natural language processing. NLP is a computer's ability to understand language. Spoken language is nuanced and doesn't fall into the highly specified programing inputs that computers typically use to function. Nuanced language and speech patterns make NLP a particularly difficult technology to program successfully. NLP analyzes both syntax and semantic meaning to deliver on a complex understanding of spoken language. NLP can recognize words based on context, categorize words into groups and use a database to determine the semantics behind the words. More mature NLP technology uses deep learning to analyze speech patterns and improve upon itself for future interactions.
Speech recognition. Speech recognition is a machine's ability to convert spoken language into a format that machines can process, but its capabilities can vary greatly. Limited speech recognition may only recognize specific words and phrases for use cases like interactive voice response (IVR) systems. More sophisticated versions can recognize natural speech for advanced use cases, including call routing, speech-to-text and voice search. Accuracy for these use cases is improving rapidly, but it will be some time before it truly mimics the accuracy of another person.
Speech analytics. Speech analytics pairs with speech recognition to locate the useful information from an audio input. Speech analysis can recognize words, audio patterns, and can even detect emotional markers in a person's voice. Speech analytics is used in contact centers as a means to identify the reason for a call, as well as the caller's mood. Most business cases for speech analytics revolve around improving customer interactions through call data collection.
Use cases for speech technology
Text-to-speech (TTS) and speech-to-text. Applications that synthesize audio from text inputs are considered TTS applications. For the enterprise, TTS is used in voice-enabled emails and other hands-free messaging applications. Speech-to-text is the reverse where audio inputs are converted into visual text. Speech-to-text is used for transcription and dictation tools. Both use cases involve speech recognition technology to identify words and convert them to the appropriate method of delivery.
Chatbot. Chatbots are programs that communicate through audio or text with a person or another machine. Chatbots rely on NLP engines and machine learning to mimic the complexity of human speech. A popular business case for chatbots is in contact centers to streamline operations and conserve agent resources. Chatbots that use machine learning and AI are able to build on each interaction to improve the user experience. Overall, chatbots can offer a more contextualized interaction than a simple IVR system.
Voice assistant. Similar to chatbots, voice assistants use NLP to help users in voice recognition environments. Consumers are likely familiar with the Amazon Alexa and Google Home voice assistants, but digital voice assistants are also finding their way into the enterprise. Most voice assistants operate on a command-response principle. As AI capabilities mature, voice assistants built with cognitive computing can perform more complex multi-step tasks such as booking appointments or purchasing tickets.