Glasgow Interactive Systems Group (GIST) logo - click here to go to the GIST Website

Multimodal Interaction Group

Conferences, Workshops, Journals
Group Publications
Jobs in the Group

Position Paper for CHI 96 Basic Research Symposium (April 13-14, 1996, Vancouver, BC)

Non-Speech Audio and Human-Computer Interaction

Stephen Brewster
Department of Computing Science
The University of Glasgow
Glasgow, G12 8QQ, UK
Tel: +44 (0)141 330 4966,


The goal of my research is to improve the usability of human-computer interfaces by the addition of sound. Sound plays an important role in our everyday world but almost none when we interact with computers. I am investigating the combination of sound with visual feedback to improve graphical interfaces, with speech to improve telephone-based interfaces and the use of sound to provide access to otherwise inaccessible systems for visually disabled people. This research improves upon the simple beeps used now to provide a rich, powerful and carefully designed set of sounds that will enable more effective interaction.


The research described in this paper suggests the use of non-speech sound output to enhance information display at the human-computer interface. There is a growing body of research which indicates that the addition of non-speech sounds to interfaces can improve performance and increase usability, for example [2, 5, 8]. Sound is an important means of communication in the everyday world and the benefits it offers should be taken advantage of at the interface. Such multimodal interfaces allow a greater and more natural communication between the computer and the user. They also allow the user to employ more appropriate sensory modalities to solve a problem, rather than just using one modality (usually vision) to solve all problems. In spite of increased interest in multimedia, little solid research has been done on the effective uses of sound in computers even though all computer manufacturers now include sound producing hardware in their machines.

Sound has many advantages. For example, it is omni-directional and attention grabbing so can be used to indicate problems to users. It can work alongside synthetic speech in purely auditory interfaces or be integrated with graphical feedback. Graphical interfaces display a large amount of information and can result in visual overload of the user. One way to overcome this problem is to use sound. Important information can be displayed on the screen and other information in sound, reducing overload of the visual sense. Brewster et al. [6] showed that by adding sound to a graphical interface both the time taken to complete certain tasks and the time taken to recover from errors could be reduced.


The sounds used for this research are all based around structured audio messages called Earcons [7,9]. Earcons are abstract, synthetic tones that can be used in structured combinations to create sound messages to represent parts of a human-computer interface. Detailed investigations of earcons by Brewster, Wright & Edwards [7] showed that they are an effective means of communicating information in sound.

Earcons are constructed from motives. These are short, rhythmic sequences that can be combined in different ways. The simplest method of combination is concatenation to produce compound earcons. By using more complex manipulations of the parameters of sound (timbre, register, intensity, pitch and rhythm) hierarchical earcons can be created [2]. Using these techniques structured combinations of sounds can be created and varied in consistent ways.


There are two strands to this work: Integrating sound into graphical human-computer interfaces and using sound for navigation cues.

Integrating sound into human-computer interfaces

My main area of investigation in this first strand has been the addition of sound to graphical interface widgets, such as buttons and scrollbars, to correct usability problems. Commonly used graphical widgets often have problems. Using a structured technique I developed [7], I have analysed the problems with standard widgets and then corrected them using sound. One problem with button widgets is that the user can slip-off the button by mistake. This problem is exacerbated because the feedback from a slip-off is the same as that for a correct button press and the user is unlikely to be looking at the button to notice the slip-off; he or she will have moved on to the next part of the task. This problem was solved by adding sound. An experiment showed that participants could recover from such slip-off errors significantly faster and with significantly fewer mouse-clicks with sound, they also significantly preferred the sonically-enhanced buttons [5]. This was not at the expense of making the system more annoying to use. A similar analysis and experimental evaluation was conducted on sonically-enhanced scrollbars [6]. In this case the sounds gave location information and stopped `kangarooing' errors. Again the results were favourable: Participants completed certain tasks significantly faster and there was significantly reduced mental workload with the sonically-enhanced scrollbar. These results show that adding sound to standard graphical widgets can improve the basic usability of an interface. The next stage of this work is to build a complete toolkit of sonically-enhanced widgets.

My final topic of interest in this strand is the addition of non-speech sounds to aid disabled people who use single-switch scanning input. With scanning input the set of choices the user can make is laid out in a matrix on the display. Scanning row-by row then occurs until the required row is selected, then scanning item-by-item begins. The user then selects the required item. Scanning input is a temporal task; users have to press a switch when a cursor is over the required target, but it is usually presented as a spatial task with the items laid-out in a grid. Research has shown that for temporal tasks the auditory modality is often better than the visual. I investigated this by adding non-speech sound to a visual scanning system. It also supported our natural abilities to perceive rhythm so that this could be used to aid the scanning process. The results from a preliminary investigation (again using earcons for the sound output) were favourable, indicating that the idea is feasible and further research should be undertaken [4].

Using sound for navigation cues

The other main strand of my research is providing navigation cues in sound. In some situations graphical feedback cannot be used to provide this information. In completely auditory interactions, such as telephone-based interfaces or those for visually disabled people, it is impossible to use graphical cues. In other systems where graphical feedback is available, the display may already be completely occupied by important information that extra graphical cues would hide [1]. For example, an interface for people with speaking difficulties who need to access a graphical library of pictographic images [3].

An experiment to discover if earcons could provide navigational cues in a menu hierarchy was conducted. A hierarchy of 25 nodes and four levels was created with earcons for each node. Participants had to identify their location in the hierarchy by listening to an earcon. Results showed that participants could identify their location with 81.5% accuracy, indicating that earcons are a powerful method of communicating hierarchy information. Participants were also tested to see if they could identify where previously unheard earcons would fit in the hierarchy. The results showed that they could do this with over 90% accuracy. These results showed that the participants quickly learned the rules from which the earcons were constructed, so demonstrating that earcons are a robust and extensible method of communicating hierarchy information in sound.


Any references by Brewster can be found on-line on my publications list

  1. Blattner, M., Papp, A. and Glinert, E. Sonic enhancements of two-dimensional graphic displays. In The Proceedings of the First International Conference on Auditory Display (Santa Fe Institute, Santa Fe) Addison-Wesley, 1992, pp. 447-470.

  2. Blattner, M., Sumikawa, D. and Greenberg, R. Earcons and icons: Their structure and common design principles. Human Computer Interaction 4, 1 (1989), 11-44.

  3. Brewster, S.A., Raty, V.-P. and Kortekangas, A. Earcons as a Method of Providing Navigational Cues in a Menu Hierarchy. Accepted for publication in HCI'96 (London, UK), 1996.

  4. Brewster, S.A., Raty, V.-P. and Kortekangas, A. Enhancing scanning input with non-speech sounds. Accepted for publication in Assets'96 (Vancouver, Canada), 1996.

  5. Brewster, S.A., Wright, P.C., Dix, A.J. and Edwards, A.D.N. The sonic enhancement of graphical buttons. In Proceedings of Interact'95 (Lillehammer, Norway) Chapman & Hall, 1995, pp. 43-48.

  6. Brewster, S.A., Wright, P.C. and Edwards, A.D.N. The design and evaluation of an auditory-enhanced scrollbar. In Proceedings of CHI'94 (Boston, Ma.) ACM Press, Addison-Wesley, 1994, pp. 173-179.

  7. Brewster, S.A., Wright, P.C. and Edwards, A.D.N. An evaluation of earcons for use in auditory human-computer interfaces. In Proceedings of INTERCHI'93 (Amsterdam) ACM Press, Addison-Wesley, 1993, pp. 222-227.

  8. Gaver, W., Smith, R. and O'Shea, T. Effective sounds in complex systems: The ARKola simulation. In Proceedings of CHI'91 (New Orleans) ACM Press, Addison-Wesley, 1991, pp. 85-90.

  9. Sumikawa, D., Blattner, M., Joy, K. and Greenberg, R. Guidelines for the syntactic design of audio cues in computer interfaces. Lawrence Livermore National Laboratory, 1986.

Back to the CHI BRS HomePage