Speech Communications - Dialog in Voice Interfaces - Summary
Voice user interfaces refer to interactive media that use speech as the main or only mode of input and feedback. This research looks at how users talk to such systems and at whether we should model these interactions on conversation.
Opportunities
Research in the social sciences shows that people react in the same way to synthesized voices as they do to human voices; therefore, “voice interfaces are intrinsically social interfaces.” Reeves and Nass also propose that since humans are experts at social interaction, they would also become experts at computer interfaces that are designed according to social principles. This suggests that people will prefer to exploit their familiarity with human-human conversation when interacting with voice interfaces.
The shared rules and expectations inherent in human conversation are largely unconscious but violations lead to interfaces that are less comfortable to use. Users will find them harder to understand and more prone to error. “Effective leverage of shared expectations can lead to richer communication and streamlined interaction” (Cohen et al. 8). For voice interface designers, Grice’s conversational maxims can serve both as a writing guide and as a guide for predicting what users are going to say.
Challenges
Human-computer dialogue is still a developing genre and little research has been done into how people prefer to talk to machines. What we do know is that human-computer talk is likely to have fewer social pleasantries and different dialogue acts than human-human conversations.
For system designers, there are issues in modeling human-computer dialog on conversations between humans. Many speech applications are based on information access and/or interactive problem solving. Human dialog is not the most efficient model for such purposes, as it contains fragments, disfluencies, overlaps, and interruptions.
Future Research Areas
- If the goal is a more natural system then speech synthesis should be improved. It leaves the biggest impression on users, especially when it does not sound natural.
- For this relatively new genre, another question we should ask is, “How do people prefer to talk to computers?”