Guidelines for the Creation of Earcons

Stephen A. Brewster, Peter C. Wright and Alistair D. N. Edwards

INTRODUCTION

Earcons [1] are abstract, synthetic tones that can be used in structured combinations to create sound messages to represent parts of an interface. They have been used to sonify several interfaces and shown to be effective at communicating complex information in sound (see Brewster [2] for a review). After two initial experiments we published a brief set of guidelines for the creation of earcons [5]. Since then we have conducted four more experiments in which have used earcons. One experiment tested parallel earcons (where two earcons were played at the same time) [6] and the other three used earcons to correct usability errors with graphical widgets [2, 3, 4]. From these we have gained further insights into designing with earcons. This page gives an updated set of guidelines based on these experiments. We will update it as we gain more experience.

GUIDELINES

The creation of a set of earcons to sonify an interface depends on the interface and what the application behind it does. However, some general guidelines can be given. Some of these may appear obvious but as yet there are few examples of effective earcons so we thought it necessary to make all aspects of earcon design explicit.

When designing a family of earcons start with timbre, register and rhythm. These can be used to create the basic structure. For example, each family of earcons might have a different timbre and default register. This would differentiate it from other families of earcons. Each family could also be given a different spatial location. Rhythm can then be used to create the major sub-groups within each family. To further differentiate the sub-groups pitch, intensity, chords or effects such as chorus or delay can be used. Care should be taken to make sure that the earcons are recognisably different.

If listeners must recognise each earcon without reference to any other, i.e. make absolute judgements, then there must be big differences between them. If listeners are able to make relative judgements then differences can be smaller. Each of the different parameters that can be manipulated to differentiate earcons will now be described.

Timbre

Use musical instrument timbres. Where possible use timbres with multiple harmonics as this helps perception and can avoid masking. Timbres that are subjectively easy to tell apart should be used. For example, on a musical instrument synthesiser use brass' and 'organ' rather than 'brass1' and 'brass2'. However, instruments that sound different in real life may not when played on a synthesiser, so care should be taken when choosing timbres. Using multiple timbres per earcon may confer advantages when using compound earcons. Using the same timbres for similar things and different timbres for other things helps with differentiation of sounds when playing in parallel.

There is also another reason why care must be taken when timbres are chosen. Some timbres are continuous and some are discrete. The electric organ and violin timbres are continuous: They carry on until they are turned off. Piano or drum sounds are discrete: They only last a short time. This is the nature of different musical instruments. If continuous sounds are needed to sonify an interaction then discrete sounds would have to be constantly turned on and off if they were to be used. This can limit the choice of available timbres.

Register

If listeners are to make absolute judgements of earcons then pitch/register should not be used. A combination of register and another parameter would give better performance. If register alone must be used then there should be large differences between earcons but even then it might not be the most effective method. Two or three octaves difference give better recognition. Much smaller differences can be used if relative judgements are to be made.

Pitch

Complex intra-earcon pitch structures are effective in differentiating earcons if used along with rhythm or another parameter. The maximum pitch used should be no higher than 5kHz (four octaves above C3) and no lower than 125Hz-150Hz (the octave of C4) so that the sounds are not easily masked and are within the hearing range of most listeners.

Take care that the pitches used are possible given the chosen synthesised timbre; not all instruments can play all pitches. For example, a violin may not sound good if played at very low frequencies. If a wide range of pitches is needed then timbres such as organs or pianos are effective.

Rhythm and Duration

Make rhythms as different as possible. Putting different numbers of notes in each rhythm is very effective. Patterson [9] says that sounds are likely to be confused if the rhythms are similar even if there are large spectral differences. Small note lengths might not be noticed so do not use notes less than 0.0825 sec. However, if the earcon is very simple (one or two notes) then notes as short as 0.03 sec. can be used.

Earcons should be kept as short as possible so that they can keep up with interactions in the interface being sonified. Two earcons can be played in parallel to speed up presentation [6]. Earcons with up to six notes played in one second have been shown to be usable. In order to make each earcon sound like a complete rhythmic unit the first note should be accented (played slightly louder) and the last note should be slightly longer [8].

Intensity

Great care must be taken over the use of intensity because it is the main cause of annoyance due to sound. The overall sound level will be under the control of the user (in the form of a volume knob). Earcons should be kept within a narrow range so that if the user changes the volume no sound will be lost and no one earcon will stand out and be annoying.

Listeners are not good at making absolute intensity judgements. Therefore, intensity should not be used on its own for differentiating earcons. If it must be used in this way then there should be large differences between the intensities used. This may lead to annoyance on the part of the user because it contravenes the previous guideline. Some suggested ranges [6] are: Maximum: 20dB above the background threshold and minimum: 10dB above threshold.

One of the main concerns of potential users of auditory interfaces is annoyance due to sound pollution. If intensity is controlled in the ways suggested here then these problems will be greatly reduced [2].

Spatial Location

This may be stereo position or full three-dimensions if extra spatialisation hardware is available. This is very useful for differentiating parallel earcons playing simultaneously. It can also be used with serial earcons, for example each family of earcons might have a different location.

Making Earcons Attention-Grabbing

In many cases earcon designers want their sounds to capture the listener's attention. This can be achieved in different ways. It can be done by using intensity. This is crude but effective (and very common). However, it is potentially annoying for the primary user and others nearby so we recommend other methods. Rhythm or pitch can be used (perhaps combined with lower intensity), for example, because the human auditory system is very good at detecting dynamic stimuli. If a new sound is played, even at a low intensity, it is likely to grab a listenerÕs attention (but not that of a colleague nearby). As another example, if the rhythm of an earcon is changed (perhaps speeding up or slowing down) this will also demand attention.

Other techniques for making sounds attention-grabbing are to use: High pitch, a wide pitch range, rapid onset and offset times, irregular harmonics and atonal or arrhythmic sounds (for more see [7]). The opposites of most of these can be used to make sounds avoidable but in this case the main parameters are low intensity and regular rhythm.

Compound Earcons

When playing serial earcons one after another use a 0.1 second gap between them so that users can tell where one finishes and the other starts. If the above guidelines are followed for each of the earcons that is to be combined then recognition rates will be high. If the above guidelines are followed then earcons played in parallel should also be recognisable.

AN EXAMPLE

Figure 1 shows an example hierarchy of earcons using the types of manipulations described above. Each earcon is a node on a tree and inherits all the properties of the earcons above it. The different levels are created by manipulating the parameters of earcons (for example, rhythm, pitch, timbre, register, tempo, stereo position, effects and dynamics). In the diagram the top level of the tree is a neutral earcon representing a fictitious family of errors. It has a flute timbre (a "colourless" instrument) a middle register and a central stereo position. The structure of the earcon from level one is inherited by level two and then changed. At level two the sound is still continuous but non-neutral timbres are used (in the figure organ and violin). Register is changed so that it matches a conventional musical layout (low register on the left, high on the right) and stereo position reflects the layout of the hierarchy, for example the node on the left has a left stereo position. At level three a rhythm is added to the earcon from level two to create a sound for a particular error. The rhythm is based on the timbre, register and stereo position from the level above. Other levels can be created by using other parameters such as tempo or effects.

Figure 1: An example of an earcon hierarchy showing sounds that could be used to represent errors.

CONCLUSIONS

This new set of guidelines improves those in Brewster, Wright & Edwards [5]. These updated guidelines are based on further experiments we have conducted. Using them an interface designer can create a set of usable earcons for his/her multimodal interface. The earcons will communicate their messages effectively and be easily recognisable and distinguishable by listeners.

REFERENCES

Blattner, M., Sumikawa, D. and Greenberg, R. Earcons and icons: Their structure and common design principles. Human Computer Interaction 4, 1 (1989), 11-44.
Brewster, S.A. Providing a structured method for integrating non-speech audio into human-computer interfaces. PhD Thesis, University of York, UK, 1994.
Brewster, S.A., Wright, P.C., Dix, A.J. and Edwards, A.D.N. The sonic enhancement of graphical buttons. Accepted for publication at Interact'95 (Lillehammer, Norway).
Brewster, S.A., Wright, P.C. and Edwards, A.D.N. The design and evaluation of an auditory-enhanced scrollbar. In Proceedings of CHI'94 (Boston, Massachusetts) ACM Press, Addison-Wesley, 1994, pp. 173-179.
Brewster, S.A., Wright, P.C. and Edwards, A.D.N. An evaluation of earcons for use in auditory human-computer interfaces. In Proceedings of InterCHI'93 (Amsterdam) ACM Press, Addison-Wesley, 1993, pp. 222-227.
Brewster, S.A., Wright, P.C. and Edwards, A.D.N. Parallel earcons: Reducing the length of audio messages. Submitted to the International Journal of Man-Machine Studies.
Edworthy, J., Loxley, S. and Dennis, I. Improving auditory warning design: Relationships between warning sound parameters and perceived urgency. Human Factors 33, 2 (1991), 205-231.
Handel, S. Listening: An introduction to the perception of auditory events. MIT Press, Cambridge, Massachusetts, 1989.
Patterson, R.D. Guidelines for auditory warning systems on civil aircraft. Civil Aviation Authority, London, 1982.

All of my references can be obtained from my publication list.