Speech Communications - Improving User Acceptance of Voice Recognition Technology and Voice Interface (in a Mobile Device context) - Summary
By Daphne Lee
Voice recognition technology (VRT) has greatly advanced since the early 90s with improved accuracy and speed. However, despite public interest in the idea of voice technology, practical applications are not that common among average users. Instead, VRT is mostly deployed (with varied success) among disabled users or users with an overwhelming need to use the technology. Thus, my research is focused on why VRT is not widely found in everyday technology and what can be done to improve user acceptance of it.
Opportunities
Speech communication technology presents an alternative input/output mode to currently existing interfaces, especially those that require hand and eye coordination. Speech has been described as the most ‘natural’ form of communication because humans communicate primarily through voice. Voice interfaces would excel in situations where users are unable to use their hands, from either a physical disability or a situational predicament (i.e. carrying something, driving, etc.). Aside from hands, situations where users cannot use their eyes to focus on a screen display promote great potential for a voice interface. Third, voice interfaces bypass the need for a physical input/output device, such as a monitor or keyboard. For example, people unfamiliar with standard computer interfaces or simply uncomfortable typing on a keyboard can use this alternative system.
Challenges
Although speech is the ideal means of communication, language involves more than just production and reception. There are subtle paralinguistic features that vastly alter the semantic meaning of words that the human brain detects but a computer cannot. Concepts such as emotion, tone, sarcasm, and homophones are difficult for computers to distinguish, needing a massive database to support even simple voice to text tasks. Current voice recognition software achieves about 95% success but fails in perfect understanding, leading to much user frustration. Many users have already had bad experiences with voice technology because of these mistakes.
Future Research Areas
- Task requirements: Continued research should be done not on where voice technology would fit, but rather when do users use voice rather than text to complete a task.
- Conversational behavior: Further research concerning how comfortable users are with speaking to a mobile device/computer. Should the interaction follow conversational speech (as in talking to another human being) or should there be a distinction because it is a machine.