Multimodal Interfaces as an Alternative to Unimodal Speech Inputs

By David Hruska

Definition

Multimodal interfaces allow people to use multiple inputs at once to carry out a task on a mobile device. The advantage of multimodality is that the weakness of one input is overcome by the strength of another. People also prefer to interact with devices multimodally, which as an added benefit, also improves task performance.

Opportunities

In the future, interacting with computers and mobile devices will become more ubiquitous to the end user. The ubiquity will create a shift in computing that will allow more natural behavior and easier to use interfaces. To create the best user experience, multimodal interfaces must be able to synchronize multiple inputs and interact consistently with the user. Multimodal interfaces also depend on multiple disciplines working together, such as Speech and Hearing Sciences, Perception and Vision, Linguistics, Psychology, and Statistics, to make the multimodal experience seamless.

Challenges

One of the current challenges of multimodal interfaces is how to tie everything together. Multimodal interfaces must be designed so that the various input types all work together seamlessly. This could mean that the device must gather two or more input streams simultaneously and know when to use certain inputs over the others. The interfaces must also be able to correctly interpret the user’s intention based on all of the inputs. In addition to processing multiple input streams, the recognition engine needs to be able to handle misinterpreted inputs.

Future Research Areas

Future research could be focused on how to integrate all of the different input methods and figuring out how to get all of them to interact together. Creating an interface that integrates all of the inputs into a consistent and synchronized experience is key for a successful implementation.

Read the full report.