The Use of 3D Audio to Improve Auditory Cues in Aircraft

 

 

 

William Dell

 

Supervisor: Prof. Chris Johnson

Second Reader: Dr John Patterson

 

 

Class: CS4H

Session: 1999/2000

 

 

Department of Computing Science,
University of Glasgow,
Lilybank Gardens,
Glasgow, G12 8QQ

 


 

Abstract

 

Auditory alarms are being used in many safety critical environments such as hospitals, nuclear power stations and aircraft. At present, these auditory alarms rarely make use of the fact that sound can be processed to come from more than one direction. Looking at aviation in particular, it is common for pilots to wear headphones that support stereo sound, which means that taking advantage of this aspect of audio is certainly feasible. Additionally, a 3D virtual acoustic display was proposed by Wenzel [1] when there was insufficient technology to test her assertions. However, with the release of DirectX 5.0, it is now practical to design and prototype auditory alarms that make use of 3D audio. This project investigates the impact of using spatialised alarms versus stereo and mono alarms. The effectiveness of the three types of alarm are analysed in terms of reaction time, error rate, learnability, performance of primary task and workload measures. The results from this experiment indicate that the technology for supporting 3D audio is not sufficient to yield an advantage over it’s alternatives.

 

 

Acknowledgements

 

I would like to thank,

 

Prof. Chris Johnson for being my supervisor and guiding the course of my project.

Dr John Patterson for being my second reader and supplying helpful feedback.

Dr Ashley Walker for giving advice and access to spatialising equipment.

Dr Stephen Brewster for advice on experimental practice and statistical analysis.

Mark Rodgers for highlighting relevant studies.

Paul Krois for supplying FAA documents.

Adrian Ng Han Boon for spatialising the auditory cues.

 

The twenty subjects that participated in the experiment.

 

 


TABLE OF CONTENTS

 

1      Introduction.. 4

1.1       Approach.. 5

2      Different ways of delivering audio.. 7

2.1       Mono.. 7

2.2       Stereo.. 8

2.3       Quadraphonic Sound.. 8

2.4       Surround Sound.. 9

2.5       Positional 3D-audio.. 9

2.5.1        Cues that aid in human sound localisation. 9

2.5.1.1     Interaural Time Difference (ITD) 10

2.5.1.2     Interaural Intensity Difference (IID) 10

2.5.1.3     Head Shadow.. 11

2.5.1.4     Pinna Response. 11

2.5.1.5     Shoulder echo. 12

2.5.1.6     Head Motion. 13

2.5.1.7     Early echo response/reverberation. 13

2.5.1.8     Vision. 14

2.5.2        Head Response Transfer Function (HRTF) 14

2.6       Conclusion.. 15

3      Auditory Warnings. 16

3.1       Advantages Of Using Auditory Warnings. 16

3.2       Disadvantages of Using Auditory Warnings. 17

3.3       Attention In The Auditory Modality.. 17

3.4       Parallel Processing.. 18

3.5       Auditory Warnings In Aircraft. 18

3.5.1        Nonspeech Displays. 19

3.5.2        Speech Displays. 20

3.5.3        Speech And Nonspeech Combinations. 21

3.6       Conclusion.. 21

4      DirectX 5.0 - DirectSound (Spatialising Sounds) 23

4.1       3D Coordinates and Distance. 23

4.2       Point Sources and Sound Cones. 24

4.3       The Listener.. 25

4.4       Sound Mode. 26

4.5       Doppler Effect. 26

4.6       Hardware. 26

4.7       Conclusion.. 27

5      Building the Experiment.. 28

5.1       Introduction.. 28

5.2       Java Applets and The Web.. 28

5.3       Design.. 28

5.4       Security.. 30

5.4.1        Java Policy Tool 30

5.5       Implementation.. 33

5.5.1        Artifical Horizon System.. 33

5.5.2        Alarm Handler. 35

5.5.3        Performance Tracker. 40

5.5.4        Turbulence Simulator. 43

5.5.5        Flight Controller. 44

5.6       Testing.. 46

5.7       Conclusion.. 46

6      Spatialised Auditory Warnings Experiment.. 48

6.1       Subjects. 48

6.2       Method.. 48

6.3       Design.. 52

6.3.1        Independent variable (IV) 52

6.3.2        Dependent variables (DV) 52

6.3.3        Latin Square. 53

6.3.4        Prediction. 54

6.3.5        Null Hypothesis. 54

6.4       Apparatus. 54

6.5       Procedure. 55

6.6       Results. 55

6.6.1        Response Time. 55

6.6.2        Performance. 61

6.6.3        Error Rate. 63

6.6.4        Learnability. 64

6.6.5        NASA TLX. 67

6.6.5.1     Mental Demand. 67

6.6.5.2     Physical Demand. 73

6.6.5.3     Temporal Demand. 76

6.6.5.4     Effort 78

6.6.5.5     Performance. 81

6.6.5.6     Frustration. 84

6.6.5.7     Overall Workload. 87

6.6.6        Questionnaire. 91

6.6.6.1     Subject being studied: 91

6.6.6.2     Instruments played by subjects: 91

6.6.6.3     Amount of game playing done by subjects: 91

6.6.6.4     Negative aspects of the experiment stated by subjects: 92

6.6.6.5     Positive aspects of the experiment stated by subjects: 92

6.7       Discussion.. 93

6.7.1        Summary of results. 93

6.7.2        What caused these results?. 94

6.7.3        Further research. 94

6.7.4        Traffic Alert and Collision Avoidance System (TCAS II) 95

6.8       Conclusion.. 96

7      Conclusion.. 98

References. 100

 


1       Introduction

 

The development of auditory warnings has been going on for a long time and the amount of knowledge in the area has become very rich. Research has been carried out to investigate the affects that various aspects of auditory warnings has on the user, e.g. [2], [3] and [4]. Some of the aspects that sound has which can be altered when designing auditory warnings include pitch, volume, timbre, attack, release, decay, amplitude, frequency and reverberation. Each of these may or may not have a significant effect on the effectiveness of auditory warnings.

 

An example of research carried out to investigate the effectiveness of different aspects of sound include that done by Haas and Edworthy. They performed an experiment to test the effects of pitch, speed and loudness of auditory alarms on perceived urgency and response time [2]. Thirty participants had to turn off alarms (with the IV’s being varied) and rate the perceived urgency. The results of this experiment allowed Haas and Edworthy to conclude that the perceived urgency increases and the response times decreases with auditory alarms which have a high frequency, a fast speed and a high level of loudness.

 

Additionally, Melara and Marks [3] found that listeners respond faster to a loud sound if the pitch is also high rather than low (despite instructions to ignore the pitch). Similarly, the listener would respond quicker to soft sounds that are low in pitch, rather than high in pitch.

 

Further research includes that done by Morris and Leung [4] who showed that voice warnings yield better reaction times and learning times over earcons and hybrid warnings. They also concluded that hybrid warnings have improved error rates over voice and earcon warnings.

 

In spite of the rich amount of research carried out in this area, auditory alarms are still not satisfactory. In fact as Stanton and Edworthy [5] highlighted, there is a considerable amount of discontent in the way that auditory alarms are designed. Morris and Leung [4] feel that investigating auditory alarms is an important area because the development and use of auditory alarms has surpassed the amount of research regarding how they should be designed. The aim of this project is to investigate the use of spatialised sound in designing auditory alarms. This may improve the reaction time, error rate and learnability of auditory alarms. Therefore, if a helpful contribution can be made to the current research, perhaps better alarms will be designed in the future. 

 

1.1      Approach

 

In 1991, Wenzel proposed the use of a three-dimensional virtual acoustic display but only addressed the human perceptual requirements and stated that technological capabilities and constraints should be addressed later [1]. Her proposed auditory display had to be capable of ‘presenting information in three spatial dimensions’ and ‘representing multiple sources which can be either static or moving’. With the release of DirectX 5.0 (Section 4), which supports this, it is now possible to validate some of her assertions about the utility of these techniques.

 

With the understanding of how humans can recognise the directionality of sound and the recent development of technology that can realistically reproduce these affects with headphones or speakers, it is important to investigate the use of spatialising auditory alarms.

 

At present, the majority of auditory alarms are monaural, which means that from the listener’s point of view, all the alarms come from the same location. This is done despite the fact that pilots often wear headphones which gives auditory alarms another method to convey information, i.e. by which channel is used. This can be taken a step further than just allowing an auditory alarm to come from the left or right, by sounding alarms from many more directions, the technical details of this are discussed in Chapter 4. Theoretically, if more information is conveyed in the auditory alarms, less pressure will be placed on the listener to handle both the primary task and the alarms going off. This is the hypothesis which will be tested in this experiment.

 

The main aim of this report is to investigate the use of mono, stereo and spatialised auditory alarms, particularly comparing stereo to spatialised alarms. Factors that will be measured to establish the advantages of each sound mode include reaction time, error rate and learnability.


2       Different ways of delivering audio

 

There are many different ways that audio can be played to the listener. My experiment used mono, stereo and positional 3D-audio. I will discuss the characteristics of the various sound modes below, mono, stereo, quadraphonic sound, surround-sound and positional 3D-audio.

 

With stereo, quadraphonic and surround-sound modes there are important implications depending on where the position of the listener’s head is (except when headphones are used for stereo). If the listener moves closer to one speaker than the rest, the origin of the sound will move accordingly. Since they are sensitive to head movement, these modes can only be used in certain situations.

 

Positional 3D audio can use head tracking technology to address this problem. Head tracking support requires detecting the location and direction of the listener’s head who must be wearing headphones. It is then necessary to dynamically transform the sounds so the perceived location remains constant as the listener moves their head.

 

2.1      Mono

 

Figure 2‑1 shows how mono sound is reproduced [6]. The sound will originate from a single location and the direction will remain fixed. Mono sound is compatible with headphones by playing the sound at equal volume in both ears.

Figure 21. Mono sound.

2.2      Stereo

 

Figure 2‑2 shows how stereo sound is reproduced [6]. The apparent direction of the sound being played can be varied to originate anywhere on the line. This is achieved by altering the volume of the sound in each speaker. Stereo sound is also compatible with headphones by altering the volume of the sound in each ear.

 

Figure 22. Stereo sound.

 

2.3      Quadraphonic Sound

 

This works on the same principle as stereo sound but there are four speakers instead of two so that the directionality of the sound can be extended, see Figure 2‑3. The apparent direction of the sound being played can be varied to originate anywhere on the bounding box around the listener. Again, this is achieved by altering the volume of the sound in each speaker. Because it is necessary to have more than two speakers this mode is not compatible with conventional headphones.

Figure 23. Quadraphonic sound.

2.4      Surround Sound

 

A quadrahphonic set-up with an additional speaker positioned close to the listener can extend the range of possible origins for sounds to anywhere inside the bounding box, see Figure 2‑4. Thus, the sound source can originate from any direction, on one plane, around the listener. This principle can be extended by adding more speakers so that the origin of a sound can also be below or above the listener. Again, because it is necessary to have more than two speakers this mode is not compatible with conventional headphones.

 

Figure 24. Surround sound.

2.5      Positional 3D-audio

 

Computers can be used to simulate what happens to a sound as it travels from the source to the listener’s ears by analysing what happens in the real world. Before covering how a computer performs this operation it is worth analysing those cues that help us to localise sounds.

 

2.5.1     Cues that aid in human sound localisation

 

Humans take for granted their ability to locate the origin of a sound source with high precision. There has been a lot of research in the area about what cues enable us to do this. Nine cues have been widely accepted, most of which are highlighted by Tonnesen and Steinmetz [7]. These are interaural time difference, interaural intensity difference, head shadow, pinna response, shoulder echo, head motion, early echo response, reverberation and vision.

 

2.5.1.1      Interaural Time Difference (ITD)

As shown in Figure 2‑5, the distance that the sound has to travel can be different for each ear. The length of b is greater than the length of a, which means there will be a slight delay in the sound reaching the right ear. Obviously the ITD for sounds which have an origin directly in front or behind the listener will be zero but for sounds positioned to the far left or right will have an ITD of about 0.63 milliseconds [7]. This is an important cue for the brain to interpret the lateral position of a sound.

 

Figure 25. The distance for the sound to travel to each ear is different.

 

2.5.1.2      Interaural Intensity Difference (IID)

Not only is there a delay because of ear position, but there is there a difference in the volume of the sound reaching either ear called the Interaural Intensity Difference (IID) which is a primary localisation cue, see Figure 2‑6 [8]. The main reasons for this is because the sound has less distance to travel to one ear than the other and the head gets in the way (head shadow cue).

 

Figure 26. Illustration of Interaural Intensity Difference

 

2.5.1.3      Head Shadow

This refers to what happens when sound has to travel through or around the head to reach the other ear. If the source of the sound is located to the left or right of the listener then the head will get in the way more than if the source is in front or behind. With the head being in the way, the sound will have a lower amplitude. Additionally, the head acts as a filter which alters the original wave making it difficult to detect the direction and distance [7].

 

2.5.1.4      Pinna Response

The pinna, see Figure 2‑7 (from [6]), has a signifcant affect on sounds before they reach the eardrum. The pinna acts as a filter on the high frequencies of sound which assists with the perception of lateral position and elevation. Tonnesen and Steinmetz state that the affect the pinna filter has ‘is highly dependent on the overall direction of the sound source’ [7].

 

Figure 27. The Outer Ear.

 

The affect that the pinna or outer ear structure has on sound is illustrated in Figure 2‑8, [8]. As shown in the diagram, the medium to high frequency end of the spectrum are affected greatly by the pinna. This filtering depends on the angle at which the sound wave reaches the pinna. Each ear will filter the sound differently giving the brain a hint to the location of the source.

 

Figure 28. Spectrum differences between original sound and the sound after the pinna has taken affect.

 

2.5.1.5      Shoulder echo

As pointed out by Tonnesen and Steinmetz [7], sounds that have a frequency of the range 1–3 kHz and come from particular directions are reflected by the upper torso of the human body, see Figure 2‑9. The elevation of the source affects the length of the time delay caused by the reflection with the shoulders. This provides information about the direction of the sound but it is not a primary auditory cue.

 

Figure 29. Sound being reflected by the upper torso.

 

2.5.1.6      Head Motion

A primary cue in locating the source of a sound is the motion of the listener’s head. It is natural for human beings to move the head to track where a sound is coming from. As the frequency of the sound increases, so does the amount of head movement to compensate for that fact that high frequency sounds don’t bend around objects very easily and are harder to localize [7].

 

2.5.1.7      Early echo response/reverberation

The path that a sound wave travels from the source to the listener is not simply a direct line, see Figure 2‑10. Instead, the waves will be reflected off the surfaces in the real world and arrive at the listeners ears at different times causing reverberation.

 

Figure 210. Effect of sound being reflected by surfaces.

 

Early echo responses occur in the first 50-100ms of a sounds life. These early echo responses and reverberation give clues to the listener about the direction and distance of the sound source. For example, if the source of a sound is close to the listener there will be less echoes than if the source of the sound is far away.

 

2.5.1.8      Vision

Vision is often a major factor in identifying the location of a sound. For example, when you hear a barking noise and you can see a dog in the rough direction it came from it is obvious where the source of the sound is.

 

2.5.2     Head Response Transfer Function (HRTF)

 

Previous sections have described how the human perceptual system uses various cues to localise sounds in the real world but how does a computer duplicate these cues to fool our brain? The answer lies in the Head Response Transfer Function (HRTF). An HRTF performs a mathematical transformation of the spectrum of a sound to simulate the affects of the cues mentioned earlier. To allow a computer to spatialise a sound it uses the relative angle (and distance) of the sound from the listener to select an HRTF to simulate that affect. There is one HRTF for each direction and distance from the listener, e.g. the effect of ten metres directly in front of the listener would encompass one HRTF. A system typically uses about one thousand HRTFs to cover a sufficient number of directions and distances around the listener’s head to simulate a full 3D space.

 

Each of these HRTFs are still built using the same prinicple as W. Bartlett in 1927 [6]. Microphones are placed in the auditory canals of a dummy and sounds that are played from a fixed location are analysed to determine the affect made on each of the frequencies. This data is used to build a filter that mimics these affects which can be applied to a monaural sound whenever you want it to appear as if it were coming from that location.

 

2.6      Conclusion

 

The different modes for delivering audio have been highlighted, from mono which aircraft cockpits currently use, to modes which require an arrangement of speakers such as quadraphonic sound or surround sound to the more recent approach of positional 3D audio. The cues that a HRTF (Head Response Transfer Function) uses to try fool the listener such as interaural time difference, pinna response or reverberation have been described. With this project investigating whether there are advantages to using positional 3D audio for auditory cues in aircraft, the necessary background information about delivering audio has been given.

 

 


3       Auditory Warnings

 

A greatly simplified description of perception was given by many philosophers and psychologists since the times of Aristotle as ‘the process of using the information provided by our senses to form mental representations of the world around us’ [9]. Similarly, a dictionary may describe it as the ‘recognition and interpretation of sensory stimuli based chiefly on memory’. Obviously hearing is a critical sense in many situations and it has many advantages over other senses, see 3.1, but also has it’s share of disadvantages, see 3.2. The use of auditory warnings has obviously been exploited for a long time and it’s use has been varied over time from the whistle on a train to speech warnings on a modern aeroplane. Speech, nonspeech and combined warnings will be covered in section 3.5.

 

3.1      Advantages Of Using Auditory Warnings

 

3.2      Disadvantages of Using Auditory Warnings

 

 

3.3      Attention In The Auditory Modality

 

The auditory modality varies from the visual modality in two main ways. Firstly, hearing is omnidirectional which means it can take input from any direction unlike vision which requires you to be focusing on a specific area. Secondly, auditory information is often transient, i.e. you hear a word or tone and then it ends whereas visual input doesn't normally disappear. The fact that the short-term auditory store is longer than the short-term visual store helps to address this problem.

 

When a person is not focusing on a stream of auditory input, the sound remains in the preattentive short-term auditory memory for three to six seconds. If you are listening to someone and your attention wanders, it is possible to switch your attention back and 'hear' the last few words you weren't actually listening to. If you don't switch your attention back then the auditory information is lost. However, if the 'background' sound is sufficiently pertinent, your adaptive mechanisms will bring this material to the focus of attention which is the theory behind auditory warnings.

 

3.4      Parallel Processing

 

In the same way that you can pay attention to both the words and melody of a song you can capitalise on parallel processing in the use of auditory warnings. In the auditory modality, the ability to focus attention on one channel is disrupted when there is increased similarity with competing messages that are to be ignored.

 

Spatial location or directionality is perhaps one of the most important dimensions of similarity [13]. As two auditory messages become closer in space, our ability to process them declines. The following bullet points highlight areas that can affect this similarity.

 

3.5      Auditory Warnings In Aircraft

 

Having reviewed perceptual aspects of auditory warnings, this section goes on to investigate their use in aviation. The amount of instrumentation required in modern aircraft has reached a point where the panel space is very cluttered, see Figure 3‑1 for the cockpit of a Beechcraft King Air A90. Since there is a limited amount of information that a pilot can scan at one time it is an important area.

 

Figure 31: The cockpit of a Beechcraft King Air A90 with cluttered panel space.

 

Visual cluttering has been a problem in aircraft. For this reason there are great advantages in replacing some tasks with other sensory channels such as hearing. However, traditional visual instrumentation has been replaced with bells, beepers and electronic tones to a point where auditory clutter is almost as bad as visual clutter.

 

3.5.1     Nonspeech Displays

 

At present aircraft use many nonspeech auditory displays such as bells, whistles, horns, buzzers, clackers and various electronic tones which can vary in intensity, pitch and duration. It is possible for pilots to learn at least ten nonspeech auditory signals although they forget their meanings over time [14]. Forgetting the meanings may be partly due to the fact that there is very little standardisation between aircraft, i.e. fighter aircraft F-4D, F-15 and F-16 and transport aircraft C-5 and C-141 all use nonspeech signals but all with different standards [15]. Doll and Folds [15] have pointed out that confusion is being caused by nonspeech sounds which are too similar. For example the fighter aircraft F-16 sounds alarms for the ground proximity warning and another for the angle-of-attack warning. One warning indicates a need to raise the nose and the other to lower the nose yet they both use an 800 Hz tone. Because of this confusion it is important to focus on signal distinctiveness and 'masking resistance'.

3.5.2     Speech Displays

 

These reduce recognition problems for non-speech audio and are more suited to certain applications than nonspeech displays.

 

The main applications for speech displays are [16]:

 

Further possibilities include [16]:

·       Commands

 

Despite these many possible applications it can be argued that speech displays should just be used for warnings. One point is that pilots consider speech displays to be noisy, strident and intrusive [16]. However, a more important point is that if speech displays are used for both advisories and warnings, the latter may be treated less urgently. Wickens points out that pilots preferred speech messages for conveying warnings but preferred visual displays for other information [16].

 

Wickens points out that pilots respond to voice warnings faster than warnings presented visually [16]. Speech technology has been used in aircraft warning displays such as the following:

 

Designers can exploit many variables when developing speech displays:

 

Speech generation

Contextual factors

Linguistic factors

 

There are arguments on what is reasonable for each of these variables to use speech displays to their full potential. For example, it is argued that speech displays should not be similar to human speech so that pilot's can distinguish the messages from those given by people at air traffic control [16]. Additionally the speech rate should not be as slow as 123 wpm which is irritating and time consuming but not as high as 178 wpm [16].

 

3.5.3     Speech And Nonspeech Combinations

 

As Wickens highlights, including an alerting tone before a voice warning message reduces response times [16]. This is because when an alarm goes off the pilot's attention isn't shifted quickly enough to apprehend the beginning of the message. Despite this combination proving to be better than a speech warning it has been shown that starting voice messages with the word ‘Attention’, ‘Danger’ or ‘Advisory’ had the same affect [16].

 

3.6      Conclusion

 

As would be expected, there are both advantages and disadvantages to using auditory warnings. For example, on one hand, the omni directional nature of hearing allows auditory alarms to be detected no matter where the pilot is looking, but on the other hand, if more than one alarm sounds at the same time the pilot may not be able to identify either. Additionally, it is necessary to consider the different approaches to using auditory alarms such as speech displays, non-speech displays or combined and the impact that each would have if used. However, it is clear from this chapter that if attention is paid to the wealth of research in this area there can be many benefits to using auditory alarms.

 

 

 

 

 

 

 


4       DirectX 5.0 - DirectSound (Spatialising Sounds)

 

The sounds that I used in my experiment were originally monaural so it was necessary for me to spatialise them for my experiment, see 6.2. In order to do this I needed a PC with a suitable soundcard and the appropriate software. The hardware included a standard PC and a Sound Blaster Live sound card. The software for spatialising sounds was written by Colin Paterson, in the Computing Science department in C++ which uses DirectX 7.0.

 

With the release of DirectX 5.0 which has added features in the DirectSound API (Application Programming Interface), it is possible to add spatialised sound (or 3D sound) to your applications. DirectSound attempts to duplicate the affects of cues which allow humans to locate sounds as realistically possible by using a HRTF (see 2.5.2). To sum it up very briefly, the brain can estimate the spatial location of a sound from various cues such as nuances in the way that sound reaches one ear before the other and ‘slight echoes and reverberations in the surrounding environment’ [7].

 

The DirectSound API is based around the concept of a virtual 3D space where sounds can be positioned with x, y and z coordinates. Obviously the listener is also placed in this 3D space but with an orientation to indicate the direction that they are facing. Once the programmer has placed the sound sources and the listener, DirectSound has the basic information that is necessary to spatialise the sounds.

 

4.1      3D Coordinates and Distance

 

DirectSound uses a left-handed coordinate system, see Figure 4‑1, [17].

 

 

Figure 41: Left-handed Coordinate System

 

The unit of measurement is the metre (although it can be changed) which means that if you place a listener at the origin and a sound at (0, 0, 10), DirectSound will try to simulate the sound as being ten metres away in the real world.

 

Since all sample sounds have much the same volume, a problem arises when you move different sound sources away from the listener. To illustrate, the unadjusted volume of a sampled sneeze will be close to the volume of a sampled explosion [17], which means that if you move them both to 100 metres from the listener they will both sound just as loud. Obviously this isn’t realistic because you would not hear the sneeze but you would certainly hear the explosion. To address this, DirectSound allows the programmer to set the distances at which the sounds are at their minimum and maximum volumes which allows the muting affect to vary for different sounds. For example, the sneeze sound could be set to have a range of half a metre (maximum volume) to fifty metres (minimum volume) and the explosion, 100 metres (maximum) to 10,000 metres (minimum). DirectSound will use this range to scale the volume in a linear fashion depending on the distance from the listener.

 

4.2      Point Sources and Sound Cones

 

DirectSound has ‘built in’ support for two different types of sound. These are point source sounds which emit waves in all directions and sound cones which only emit waves in directions which are constrained by a cone (like a loudspeaker). This is illustrated in the following diagram [17].

 

Figure 2  Sound Cones

Figure 42: Point Sources and Sound Cones

 

An orientation, along with two angles are specified for each sound source to give the areas A, B and C as shown in Figure 4‑2. A listener in area A would hear nothing; in B would hear the sound at full volume (scaled according to the distance from the sound source). Listener C would hear the sound scaled in accordance with the distance from the inner cone and, of course, the sound source.

 

4.3      The Listener

 

As mentioned earlier, it is the listener’s position and orientation relative to the sound sources that affects what the environment will sound like. Setting the location of the listener obviously requires one set of coordinates. However, to set the orientation requires two vectors which are at right angles to each other and intersect at the centre of the virtual head, see Figure 4‑3 [17].

Figure 4  Listener Orientation

Figure 43: Listener Orientation

As can be seen, the top vector points up from the virtual head and the front vector points forwards. This provides enough information for DirectSound to set the orientation of the listener’s ears.

 

4.4      Sound Mode

 

An obvious nuance that affects how a sound should be played to the listener arise from three possible modes [17]:

 

4.5      Doppler Effect

 

A well-known feature of DirectSound is that of simulating the Doppler effect. As mentioned earlier, each sound source can have a location and a direction but a velocity can also be specified. With this information DirectSound is able to simulate the Doppler effect by adjusting the pitch of the sound. A common example of the Doppler effect is when an emergency vehicle drives past with a siren blaring. The pitch increases as the siren moves towards the listener and decreases as the vehicle moves further away.

 

4.6      Hardware

 

If the programmer specifies the hardware attached to the system, then DirectX can take advantage of it by using the hardware to perform operations rather than having DirectSound emulating how it is done. Additionally, the hardware attached affects how the application being developed should be scaled. For example, when there is no hardware acceleration available then the developer should be cautious about how many 3D sounds are incorporated. For example, the Lake Digital Sound Processor (DSP) can render multiple sounds with Doppler effects, directivity and support for room acoustics in real time [18].

 

4.7      Conclusion

 

DirectX 5.0 provides a host of features which give a software developer the necessary tools to produce powerful multimedia applications. As highlighted, the part that gives support for sound is called DirectSound which is vastly superior to it’s previous version. The extensive support for 3D sound allows another dimension to be added to multimedia applications. The software used in this project only touched upon the basic capabilities of DirectSound. However, it is clear that DirectX 5.0 is a flexible platform which can be utilised in the design and prototyping of 3D auditory alarms.

 

 


5       Building the Experiment

 

5.1      Introduction

 

The construction of my experiment required implementing a primary task for the user to concentrate on, while a secondary task of turning off alarms interrupts them. As mentioned in section 6.3, it is the performance of the user with the primary task, the response time to the alarms and the error rate which are of interest. It would have been possible to use an existing flight simulator (or other task which requires continuous interaction) as the primary task and a custom-built secondary task, which sounds alarms and records response times and error rates. However, it would have been difficult to compare the performance of the user in the primary task under different alarm conditions. For this reason I chose to implement a primary task that requires continuous interaction with the user and that records the performance. It is an artificial horizon simulator which requires the user to keep the horizon as level as possible in the face of occasional disturbances. The performance of a user is expressed as a percentage which reflects the amount of time that the horizon was level enough to have the circle overlapping with the crosshair.

 

5.2      Java Applets and The Web

 

I chose to implement my experiment in Java using applets so the experiment could be run over the internet. This way it would be easier to get subjects to participate in the experiment and it is easier for others to replicate my experimental method.

 

 

5.3      Design

 

The implementation of the experiment was divided into various subsystems including an alarm handler, performance tracker, flight controller and a turbulence simulator. I have divided these into two categories, 'statistical data' which captures the data necessary to analyse the different sound modes and 'artificial horizon' which is concerned with providing the primary task of keeping a plane level, see Figure 5‑1. The components are described in more detail below:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 51: Hierarchical Design of the Software for the Experiment

 

5.4      Security

 

I chose to implement my experiment in Java so that it could be run over the internet. However, my experiment records data about the user’s interaction with the system under different circumstances. Because of the security constraints that Java imposes to applets running over the web, this issue was more complicated than at first suspected. Originally I was hoping that the applet could simply e-mail me the results, but the restrictions that Java enforces made this impossible. Therefore the only other option is to record the data in a file which the subject then transfers over.

 

5.4.1     Java Policy Tool

 

The default permissions of applets being executed over the internet restricts the actions that can be carried out to prevent malicious attacks. Because of this, the applet for my experiment is unable to write the results to a file in my workspace unless special permissions are granted by the user. Java’s policy tool allows users to grant these essential permissions to my trusted applet. Each user has a ‘.java.policy’ file which contains a list of policies that the user sets up. Each of these policies contains a list of permissions that have been granted to various class files distributed over the internet. When an applet is being executed, the Java run-time environment checks this policy file against each instruction for conflicts. If the policy file forbids an instruction to be executed then an exception is raised by the security manager. The user that wishes to participate in my experiment grants the permissions using the policy tool, see Figure 5‑2.

 

Figure 52: Java Policy Tool for granting permissions to applets over the Web.

 

The Policy Tool automatically loads the user’s default policy file (if one exists) which can be modified using this tool. The user can add a set of permissions to applets that run over the web by using the ‘Add Policy Entry’ feature, see Figure 5‑3.

 

Figure 53: Add Policy Entry screen.

 

The user then has to add individual permissions, such as granting file writing permissions to any applets in my experiment web page, using the ‘Add Permission’ screen, see Figure 5‑4.

 

Figure 54: Use this to add individual permissions to applets.

If the user grants write permissions to my applet, then my experiment will be able to record data in a file in my workspace necessary for analysis. However, every subject was unlikely to go through the many screens giving permissions to my applet. Therefore, I designed an alternative method of granting permissions:

They run a batch program which is available from the experiment web page.

 

If the user already contains a ‘.java.policy’ file in their profile folder then this will overwrite it, in which case they may want to use the policy tool to grant the permissions themselves. Obviously they will then have to run a different batch file on the experiment web page which doesn’t copy this file. Since it would be very surprising if any students participating in the experiment already had a .java.policy in their profile folder, this solution proved to be very effective.

 


5.5      Implementation

 

This section presents a summary of the artifical horizon system and it’s four different subsystems. These modules are the Alarm Handler, Performance Tracker, Turbulence Simulator and Flight Controller.

 

5.5.1     Artifical Horizon System

 

Along with coordinating the efforts of it’s subsystems it is responsible for repainting the window contents of the artificial horizon simulation.

 

The ‘paint’ method continuously repaints the contents of the applet window which is how the simulation produces the animated affect of flying a plane. The location of the circle, which the user is trying to keep over the crosshair is simply provided as an aid to the user to keep the plane level. The direction that the plane is heading determines the appearance of the sky and ground and the circle will help to explain this relationship.

 

The vertical direction of the plane can be summarised as follows. If the circle is above the crosshair then the plane is heading upwards towards the sky so you would see less ground and more sky, see Figure 5‑5. Similarly if the circle is below the crosshair then the plane is heading downwards towards the ground so you would see more ground and less sky, see Figure 5‑5.

 

 

 

 

 

 

 

 


Figure 55: Plane heading towards the sky firstly, and towards the ground secondly.

 

The horizontal direction (heading to the left or right) of the plane can be summarised as follows. If the circle is to left of the crosshair then the plane is heading to the right, see Figure 5‑6. Similarly, if the circle is to the right of the crosshair then the plane is heading to the left, see Figure 5‑6.

 

 

 

 

 

 

 


Figure 56: Plane heading to right firstly, and heading to the left secondly.

 

As Figure 5‑7 shows, the affects of vertical and horizontal movement can be combined to provide a fairly realistic model of what happens when flying a plane.

 

 

 

 

 

 

 

 


Figure 57: Combined affects of vertical and horizontal movement.

 

This relationship between the circle and the crosshair is mapped mathematically in the ‘paint’ method to draw two filled polygons for the ground and sky.

 


5.5.2     Alarm Handler

 

As described in 6.2, the user has to fly the plane as level as possible while having to disable alarms that are going off. This is done by pressing a button (F1- F8) that is associated with the alarm. Additionally, as stated in 1.1 the response time for each alarm and the error rate for each simulation must be recorded for data analysis. As shown in 5.3, it is the alarm handler that is responsible for these tasks. The class ‘Alarm_Handler’ was implemented to encapsulate these responsibilities. Figure 5‑8 shows the ‘sound_alarm’ method which is called by the Artificial Horizon System when an alarm is to be sounded. It records which alarm was set off and at what time it started to sound. When the user tries to turn off an alarm (or ignore the alarm) it will be able to tell if the user made an error and time how long it took.

 


 


Figure 58: Alarm_Handler method for sounding alarms.

 


 

Figure 5‑10 shows the ‘turnoff_alarm’ method which handles the user requests to turn off alarms. Please note that all feedback added to the text area in the simulation is also written to the results file in the ‘paint’ method, which is why there are no File IO operations in this method. When the user presses a key, a request is made to the alarm handler to turn off the corresponding alarm, see
Figure 5‑9
.

 

Text Box:


Figure 59: Excerpt of code for handling key presses that turn off alarms.

 


When the alarm handler receives this request the time is recorded (‘off_time = System.currentTimeMillis();’) so that by subtracting the time the alarm was sounded will give the response time. Additionally, each alarm that the user turns off is compared with the alarm that was actually sounded to detect any errors; this is also recorded in the results file. One last responsiblility given to this class is to suspend the simulation if the last alarm is turned off.

 


 


 


Figure 510: Alarm_Handler method for handling user requests to turn off alarms.

 

5.5.3     Performance Tracker

 

It was necessary to measure the performance of the user operating the simulator under different conditions, i.e. monaural, stereo and spatialised alarms, to analyse which is the best. Since the aim of the user operating the simulation is to keep the plane as level as possible, it was decided to measure the performance of the user by monitoring how much time the cirlce is overlapping with the cross-hair. Figure 5‑11 gives an example of the simulation running with the circle overlapping the crosshair and hence the plane is relatively level; Figure 5‑12 gives an example of the simulation running with the circle away from the crosshair and hence the plane is not very level.

 

Figure 511: Simulation running with the circle overlapping with the crosshair.

 

 

Figure 512: Simulation running with the circle NOT overlapping with the crosshair.


A thread was implemented called ‘measurePerformance’, which is schedueld to interrupt the processor every 50 milliseconds to see if the circle is overlapping with the crosshair, see Figure 5‑13. Although the ‘measurePerformance’ thread is scheduled to use the processor every 50 milliseconds it may be delayed for slightly longer. However, the effect of this would have a negligible impact on the result. By regularly interrupting the processor and keeping track of the number of times the circle was overlapping with the crosshair and the number of times it wasn’t, you can accurately judge the performance of the user. The code in Figure 5‑13 does exactly this, with ‘in’ representing the number of times the plane was relatively level and ‘out’ representing the number of times the plane was not very level.

 


 


Figure 513: Excerpt of code for measuring the performance of the user.

 

To express the performance of the user in a meaningful way it would make sense to express the performance of the user as a percentage. Figure 5‑14 shows a method which does exactly this, with the value being returned representing the percentage of time that the plane was relatively level.

 


 


Figure 514: Excerpt of code for returning the percentage score of the user.

 

5.5.4     Turbulence Simulator

 

The responsibility of this subsystem is to provide the affect of turbulence to the user trying to control the plane. The user is trying to keep the horizon level but disturbances are constantly being produced by the turbulence simulator so that the horizon will rarely remain level. Obviously the user has got to counterbalance these movements by dragging the mouse.

 

To simulate this affect, two threads were implemented to adjust the direction of the plane; one thread would adjust the horizontal direction the plane was heading and the other would affect the vertical direction. Each thread interrupts the processor every 50 milliseconds and makes a slight adjustment to the direction. The decision to set the sleep time to 50 milliseconds was made so that the simulation was smooth while not placing too much demand on the processor (twenty interruptions a second by each thread won’t place too much strain on the CPU).

 

The amount of change made to the direction of the plane varies to give the appearance that the turbulence is random. Each time one of these threads interrupts the processor, they would adjust the direction of the plane by a value in the range of –10 to +10 pixels determined by looking up the next value in an array. Each thread had an array of approximately 1,000 integers in this range chosen to give smooth changes in direction, i.e. rather than altering the horizontal direction with a jump by going from –10 (10 pixels to the left) to +10 (10 pixels to the right) the transition would be –10, -9, … 9, 10.

 

Because the adjustment made to the horizontal and vertical direction is pre-decided, the turbulence will be exactly the same on each run of the simulation. It was considered to make the turbulence completely random (by using a random number generator) but because of the user’s interaction with the flight path the simulation affectively becomes random. Additionally, if the turbulence was generated randomly, a subject may be lucky on one trial and get a flight that is relatively stable and unlucky on another trial which is hard to control having the affect of skewing the results. Furthermore, if both of the threads are having to generate random numbers then the simulation would be slowed down.

 

5.5.5     Flight Controller

 

The purpose of the flight controller subsystem is to translate the user’s mouse movements to control the flight of the plane. It was decided to implement the controls of the plane the same as any other flight simulator as follows:

 

In Figure 5‑15 the dashed lines overlaid on the simulator screen show the affect on the circle of dragging the mouse (with the left mouse button pressed) in a south easterly direction. Obviously the horizon will become more level if the circle is moved closer to the crosshair.

 

Figure 515: Simulator screen with arrows indicating the effect of dragging.

 

A mouse listener and a mouse motion listener obviously had to be implemented to implement this part of the simulator. The mouse listener keeps note of whether the left mouse button is being held down or not and if it is, would also record the coordinates of where the mouse was clicked. Subsequently, the mouse motion listener would redraw the circle whenever the user dragged the mouse – the horizontal and vertical distance being determined by the coordinates of the mouse pointer in relation to the previous location of the circle when the screen was repainted. Figure 5‑16 shows the code for listening to the mouse being dragged and Figure 5‑17 shows the code that updates the coordinates of the circle to be drawn (‘new_x’ and ‘new_y’). The call to ‘repaint’ calls the ‘paint’ method which uses the coordinates ‘new_x’ and ‘new_y’, to draw the new circle, ground and sky.

 

 

 

 

 

 


 

 

 


Figure 516: Mouse motion listener to redraw the circle when the user drags the mouse.


 

 


Figure 517: Method to update the coordinates of the new circle to be drawn.

 

 

5.6      Testing

 

Each subsystem was individually tested to ensure that they met their requirements. When it was clear that each worked correctly and in a robust fashion the subsystems were progressively combined and tested. When the complete system was formed, the artificial horizon simulator was again thoroughly tested for correctness and robustness. Before running the experiment various users operated the system to ensure there were no errors present in the final version.

 

5.7      Conclusion

 

Each individual subsystem was implemented and these were successfully combined to produce the artificial horizon simulation. The turbulence simulator provided the desired effect of disturbances; the alarm handler correctly set off alarms, accepted the user’s requests to turn off alarms, recorded response times and any errors; the performance tracker correctly recorded the performance of the user operating the system and the flight controller provided a responsive way for the user to navigate the horizon with mouse movements. Additionally, the system had three modes of alarm, mono, stereo and spatialised.

 

 

 


6       Spatialised Auditory Warnings Experiment

 

6.1      Subjects

 

Twenty subjects were used in this experiment, all of which were students from Glasgow University. Subjects completed a brief questionnaire before beginning the experiment which highlighted that they had varying amounts of computer game experience which could affect their ability to control the simulator (the average being monthly). Additionally, a large portion of the participants studied either Computer Science or Software Engineering (85%) which would indicate they had an extensive amount of experience with computer-based tasks. The majority of the subjects didn’t play instruments (70%) but it may be true that subjects that play instruments have an increased ability to distinguish alarms. The subjects that studied computing-related subjects or played instruments were fairly distributed between the two groups, see 6.3.3.

 

6.2      Method

 

The purpose of the experiment is to investigate the use of spatialised sounds for auditory warnings. Users had a primary task they had to concentrate on while concurrently having to deal with any alarms that go off. The primary task for the experiment was controlling an artificial horizon simulator, see Figure 6‑1. The secondary task was turning off the alarms as they sounded by pressing the corresponding function key (F1 – F8).

 

The user had to ensure that the circle was over the cross hair to stabilise the horizon. The position of the plane and horizon changed periodically so the user had to use the mouse to control the flight as well as possible. The direction of the “plane” would constantly drift off in various directions requiring constant interaction by the user to correct the flight path.

 

The alarms that go off in the simulation were obtained from the Federal Aviation Authority’s web-site. They are actual air traffic control messages that had been recorded. The eight alarms that are used in the simulation are as follows:

 

The user had to wear headphones to hear the alarms because there are three different modes for the simulator which are monaural, stereo and spatialised. In monaural mode, the alarm would sound with equal volume in each ear; in stereo mode the alarm would sound at full volume in the left or right ear (and silent in the other ear) indicating the side of the simulator that the alarm originated from and in spatialised mode the directionality of the alarm would indicate which of the eight directions that the alarm originated from.

 

Figure 61: Artificial Horizon Simulator

 

For the monaural simulation, the volume of the auditory cue is the same in each ear, see Figure 6‑2. Therefore, the spatial location is the same for each auditory alarm which gives no extra clues to the listener to identify which alarm needs turning off.

 

Figure 62: How alarms are played for the monaural simulations.

 

For the stereo simulation, the auditory alarm plays at full volume in one ear and is silent in the other depending on the side of the screen that the alarm originated, see Figure 6‑3. For example, if the alarm is one of the four on the left then the alarm would play at full volume in the left ear and would be silent in the right ear.

Figure 63: How alarms are played for the stereo simulations.

 

For the spatialised experiment the location of the auditory alarms were arranged in a circle around the listener. With eight alarms the angle between each direction is forty five degrees. As shown in Figure 6‑4, if the alarm is situated on the left hand side of the screen the direction of the alarm will also be more from the left and vice versa. However, the spatial location of the auditory alarms has been extended to give more of a hint than just which side of the simulator the alarm originated. For example, if the alarm sounding is situated in the top left hand corner of the screen, then the spatial location of the alarm will be in front and to the left of the listener. Similarly if the alarm sounding is situated in the bottom right hand corner of the screen then the spatial location of the alarm will be behind and to the right of the listener. Therefore, each alarm has a unique direction which can be used to identify which alarm is sounding.

Figure 64: How the alarms are played for the spatialised simulations.

 

6.3      Design

 

It was designed as a single-factor, within groups experiment.

 

6.3.1     Independent variable (IV)

 

6.3.2     Dependent variables (DV)

 

To deal with the learning affect, fatigue and boredom biases, the order that the subjects carried out the tasks of the experiment was counterbalanced. One group would perform the stereo simulation and then the spatialised simulation whereas the second group would perform the experiment with these reversed. This way the results of the biases in each group would negate each other. Both groups practices with the monaural condition first.

 

6.3.3     Latin Square

 

 

Learning

Condition 1

Condition 2

Group 1

Train and perform the experiment in mono, stereo, spatialised and then mono.

Stereo

Spatialised

Group 2

Train and perform the experiment in mono, spatialised, stereo and then mono.

Spatialised

Stereo

 

As stated in the Latin square, each subject participated in the simulation six times (they ran the simulation in each mode twice). The first four simulations were purely to remove the learning affect because the user performs considerably better when they have had a chance to learn the alarms. Only the results for the last two simulations were recorded for analysis. The monaural mode simulation sounded twenty alarms and the stereo and spatialised simulations each sounded ten. In all modes the order that the alarms were presented was randomised. Since the experiment was designed to compare stereo versus spatialised alarms the number of alarms in monaural was increased so that the user could become familiar with controlling the simulator and the alarms. The mono simulation lasted approximately 5 minutes and the stereo and spatialised simulations approximately 2.5 minutes each. The user would have approximately 30 seconds rest between each simulation. Leaving time for filling in the questionnaire and describing the experiment totaled approximately 30 minutes for each subject.

 

6.3.4     Prediction

 

When the experiment is run in the spatialised mode as apposed to the stereo mode, the dependent variables will be affected by a statistically significant amount. The response times for spatialised versus stereo will be lower; the performance higher (amount of time the circle was over the crosshair); the error rate lower and the learnability for spatialised higher. The six NASA TLX measures will reflect the lowest overall workload for spatialised, then stereo and then finally monaural.

 

6.3.5     Null Hypothesis

 

Whether the experiment is run in stereo or spatialised will have no statistically significant affect on the dependent variables. Additionally, there will be no significant difference between the NASA TLX ratings for the mono, stereo and spatialised simulations.

 

6.4      Apparatus

 

All subjects performed the experiment on a desktop personal computer (450 MHz Pentium III CPU, 256 MB RAM, Windows NT) using 17 inch monitors (resolution of 1024 by 868) and using a standard Logitech mouse to control the simulation. The alarms were heard using headphones that supported stereo sound (Sony MDR-V150) with the volume control in Windows set to maximum. The artificial horizon applet was displayed in a window of size 720 x 540 pixels.

 


6.5      Procedure

 

Subjects participated in the experiment one at a time. They received a brief description of what to expect by the person running the experiment before commencing. They then commenced with running the six simulations in the appropriate order. After each simulation the user was presented with an on-screen message instructing them what to do next. After the fourth, fifth and sixth simulations the user was asked to fill in the appropriate section of the Nasa TLX form by the on-screen message. The details about response times, error rates and performance were unobtrusively recorded by the computer during the experiment. The user was finally asked to fill in the questionnaire.

 

6.6      Results

 

The results of the experiment are divided into six different sections namely, response time, performance, error rate, learnability, NASA TLX measures and questionnaire results.

 

6.6.1     Response Time

 

The response time for each alarm correctly turned off was measured for each simulation (mono, stereo and 3D). Table 6‑1 highlights the minimum and maximum response times, the range, mean and standard deviation.

 

Figure 6‑5 is a bar chart highlighting the average response times for each of the twenty subjects and an average of all the subjects put together. The y-axis gives the response time in milliseconds and the standard deviation for each mode (mono, stereo and 3D) is displayed on each bar.


 


Minimum (ms) (with subject)

Maximum (ms) (with subject)

Range (ms)

Mean (ms)

Standard Deviation (ms)

Mono

2520 (B)

5184 (S)

2664

3973

641

Stereo

2189 (A)

4712 (N)

2523

3360

663

3D

2489 (K)

4782 (M)

2293

3479

735

 

Table 61: Table highlighting min, max, range, mean and standard deviation of the response times.

 

Although the response times for stereo alarms are faster than 3D alarms (the mean for stereo is 3360 ms and for 3D it is 3479 ms) it was necessary to analyse the results to see if this could have been caused by chance. Figure 6‑6 shows the t-test results for comparing the stereo and 3D results. A within groups, paired two sample for means t-test highlighted that there was not a significant difference in the results. A confidence level of 95% was selected, and a t-value of 2.09 was necessary to conclude that the independent variable was responsible for reducing the response time, see Figure 6‑6. Since the t-value given was only 0.95 it is likely that the improved response time was caused by chance.


 

 

 

 


 

Figure 65: Bar chart representing average response time for each subject and overall average


 

 

 
 

 

 

 

 

 

 

 

 

 

 

 


Figure 66: t-Test results for Stereo versus 3D response time.

 


 


6.6.2     Performance

 

The performance was recorded for each user and represented as a percentage of how much time the cirlce was overlapping with the cross-hair. Again this was recorded for the simulation running in different sound modes and

 

 

Figure 6‑7 and Table 6‑2 highlights the minimum and maximum performance scores, the range, mean and standard deviation.

 

 

 

Figure 6‑7 is a bar chart highlighting the performance scores for each of the twenty subjects and an average of all subjects put together. The y-axis gives the percentage of time that the circle was overlapping with the crosshair and the standard deviation for each mode (mono, stereo and 3D) is displayed on each bar.

 

 

Minimum (ms) (with subject)

Maximum (ms) (with subject)

Range (ms)

Mean (ms)

Standard Deviation (ms)

Mono

56.2% (G)

99.6% (I)

43.4%

88.4%

9.8%

Stereo

72.7% (G)

100% (I)

27.3%

94.5%

6.3%

3D

72.8% (G)

99.4% (D)

26.6%

94.1%

6.1%

 

Table 62: Table highlighting min, max, range, mean and standard deviation of performance scores.


 


 

 


Figure 67: Bar chart representing performance score for each subject and the overall average.

 

Although the average performance score was slightly better for stereo than 3D (the mean for stereo is 94.5% and for 3D is 94.1%) it was necessary to analyse the results to see if this could have been caused by chance. Figure 6‑8 shows the t-test results for comparing the stereo versus the 3D performance scores. A within groups, paired two sample for means t-test highlighted that there was not a significant difference in the results. A confidence level of 95% was selected, and a t-value of 2.09 was necessary to conclude that the independent variable was responsible for improving the performance, see Figure 6‑8. Since the t-value given was only 0.68, it is likely that the improved performance was caused by chance.

 


 


Figure 68: t-Test results for Stereo versus 3D performance scores.

 

6.6.3     Error Rate

 

Unfortunately the participants in the experiment made very few errors when running the experiment which provides little data for anlaysing which condition is the best. There were the same number of errors in both the stereo and 3D simulations, a mean of 0.2 errors per subject. The subjects did make more errors when running the monaural simulations but this was expected because every subject had less practice when running the mono simulation than the stereo and 3D simulations. The t-value of zero for the within groups, paired two sample for means t-test in Figure 6‑9 confirms that the independent variable had no affect on the dependent variables.


 


Figure 69: t-Test results for Stereo versus 3D error rates.

 

6.6.4     Learnability

 

It is worth mentioning that the analysis for measuring the learnability of each type of alarm is purely speculative and cannot be justified with student t-tests. Basically, the theory behind measuring learnability is that as the user become more familiar with the alarms, their response times will progressively get faster and faster, e.g., the tenth alarm they turn off should theoretically be faster than the first alarm they turned off.

 

By averaging the response times of the twenty subjects for the first alarm turned off, then the second alarm, the third alarm and so on until the last alarm, it will be easier to see if the learnability is any different for stereo versus 3D. Figure 6‑10 is a line graph which contains two lines representing this. Both the stereo and 3D simulations required the users to turn off ten alarms which are numbered one to ten on the x-axis. The response times of the twenty subjects, for each alarm number are then averaged and plotted on the line graph. Additionally, a trend line has been added for the stereo average response times and the 3D average response times which may give a better idea of which response times decrease quicker.

 

As can be seen from the trend lines, stereo has a steeper downward gradient than 3D. This indicates that the subjects learnt the stereo alarms quicker than the 3D alarms and hence the stereo alarms may be more learnable. However, it is unknown whether the difference was caused by chance or by the independent variable.


Figure 610: Graph showing learnability of the stereo and 3D simulations.

 


6.6.5     NASA TLX

 

NASA TLX is a technique for determining the subjective workload for a specific task. Each user was asked to fill in a NASA TLX questionnaire to give ratings to the six factors that affect the overall workload.

 

To establish if the ratings given could have been given by chance, a within groups, paired two sample for means t-test was carried out for each category. A confidence level of 95% was always selected, which means that a t-value of 2.09 was necessary to conclude that the independent variable was responsible for affecting the category rating. Three t-tests were carried out for each category to compare mono versus stereo, mono versus 3D and stereo versus 3D.

 

Each of these six factors and the overall workload will be dealt with separately.

 

6.6.5.1      Mental Demand

 

As highlighted by Franklin [19], mental demand refers to: “How much mental and perceptual activity was required (e.g. thinking, deciding, calculating, remembering, looking, searching, etc.)? Was the task easy or demanding, simple or complex,

exacting or forgiving?” which has endpoints low and high.

 

Each subject selected a point somewhere between low and high to rate the amount of mental demand they felt they were under for the monaural, stereo and 3D simulations, see Figure 6‑11. Each subject’s rating and an overall average is displayed in a bar chart, see Figure 6‑12.

 

Figure 611: The form used to rate the mental demand experienced during each simulation.


Figure 612: Chart showing Mental Demand ratings for each subject and an overall average.

 



The majority of the students rated stereo as requiring the least amount of mental demand, then 3D and then monaural. Additionally, the average rating given by the subjects is 62 for mono, 51 for 3D and 42.5 for stereo.

 

The t-tests carried out to show if the affect was caused by chance are shown in Figure 6‑13. For each the t-value was greater than 2.09 confirming that the stereo simulation required the least amount of mental demand, then the 3D simulation and then the monaural simulation. However, the subjects participated in the mono experiment first and with more alarms than the others, so the comparison of mono with stereo and 3D was expected. It is more than likely that the effect would be reduced if another experiment was carried out to compare mono with stereo and 3D fairly.



 

 


Figure 613: t-Test results for Mono, Stereo and 3D Mental Demand ratings.


 

6.6.5.2      Physical Demand

 

Physical demand refers to: “How much physical activity was required (e.g. pushing, pulling, turning, controlling, activating, etc.)? Was the task easy or demanding, slow or brisk, slack or strenuous, restful or laborious?” and has endpoints low and high [19]. Each subject selected a point somewhere between low and high to rate the amount of physical demand they felt they were under for the monaural, stereo and 3D simulations, see Figure 6‑14. Each subject’s rating and an overall average is displayed in a bar chart, see Figure 6‑15.

 

Figure 614: The form used to rate the physical demand experienced during each simulation.


 

Figure 615: Chart showing Physical Demand ratings for each subject and an overall average.

 

 


The average rating given by the subjects was 55 for mono, 43.5 for 3D and 41.5 for stereo. This points towards mono requiring the highest amount of physical demand, then 3D and then stereo, although stereo and 3D have very close scores.

 

The t-tests carried out to show if the affect was caused by chance are shown in Figure 6‑16. The two t-tests comparing mono against stereo and 3D conclude that mono required the most physical demand (t-values of 3.78 and 3.61 respectively and they had to be greater than 2.09). However, the t-test comparing stereo to 3D enables us to conclude that the difference could have been caused by chance (the t-value was only 0.62 and had to be greater than 2.09). This is what was expected because the monaural simulation lasted twice as long as the stereo or spatialised simulations so would have required more physical demand. The stereo and spatialised simulations lasted the same amount of time and required the same amount of mouse movement which suggests the physical demand should not be different, as the results reflect.

 


 

 


Figure 616: t-Test results for Mono, Stereo and 3D Physical Demand ratings.

 


6.6.5.3      Temporal Demand

 

As highlighted by Franklin [19], temporal demand refers to: “How much time pressure did you feel due to the rate or pace at which the tasks or task elements occurred? Was the pace slow and leisurely or rapid and frantic?” and has endpoints low and high.

 

Each subject selected a point somewhere between low and high to rate the amount of temporal demand they felt they were under for the monaural, stereo and 3D simulations. Each subject’s rating and an overall average is displayed in a bar chart, see Figure 6‑17.



 


Figure 617: Chart showing Temporal Demand ratings for each subject and an overall average.


 

The average rating given by the subjects was 46.5 for mono, 41.5 for 3D and 36.5 for stereo. This points towards mono requiring the highest amount of temporal demand, then 3D and then stereo.

 

The t-tests carried out to show if the affect was caused by chance are shown in

Figure 6‑18. The two t-tests comparing mono against stereo and 3D conclude that mono required the most temporal demand (t-values of 3.16 and 2.52 respectively and they had to be greater than 2.09). However, the t-test comparing stereo to 3D enables us to conclude that the difference could have been caused by chance (the t-value was 1.81 and had to be greater than 2.09). Again the mono simulation was first, so the subjects had less practice and the mono simulation required twice as many alarms to be turned off, so the mental demand would obviously be greater. If the experiment was repeated to compare mono fairly then the effect would have been reduced. However, the stereo and 3D simulations were compared fairly and the effect on temporal demand were not statistically significant, showing that the subjects didn’t feel any more comfortable with the pace at which the tasks occurred for either mode.

 


 

 


Figure 618: t-Test results for Mono, Stereo and 3D Temporal Demand ratings.

 

6.6.5.4      Effort

 

Effort refers to: “How hard did you have to work (mentally and physically) to accomplish your level of performance?” and has endpoints low and high [19].

 

Each subject selected a point somewhere between low and high to rate the amount of effort they felt they put into the monaural, stereo and 3D simulations. Each subject’s rating and an overall average is displayed in a bar chart, see Figure 6‑19.



 


Figure 619: Chart showing Effort ratings for each subject and an overall average.

 


 

The average rating given by the subjects was 62 for mono, 44.5 for 3D and 46.5 for stereo. This points towards mono requiring the highest amount of effort, then 3D and then stereo.

 

The t-tests carried out to show if the affect was caused by chance are shown in Figure 6‑20. The two t-tests comparing mono against stereo and 3D conclude that mono required the most effort (t-values of 4.59 and 4.15 respectively and they had to be greater than 2.09). However, the t-test comparing stereo to 3D enables us to conclude that the difference could have been caused by chance (the t-value was 0.72 and had to be greater than 2.09). Again the mono simulation was first, so the subjects had less practice and the mono simulation required twice as many alarms to be turned off, so the effort would obviously be greater. If the experiment was repeated to compare mono fairly then the effect would have been reduced. However, the stereo and 3D simulations were compared fairly and the effect on effort were not statistically significant, showing that the subjects didn’t find either of the simulations any easier.

 

 

 


Figure 620: t-Test results for Mono, Stereo and 3D Effort ratings.

 


6.6.5.5      Performance

 

Performance refers to: “How successful do you think you were in accomplishing the goals of the task set by the experimenter (or yourself)? How satisfied were you with your performance in accomplishing these goals?” and has endpoints poor and good [19].

 

Each subject selected a point somewhere between poor and good to rate how successful they felt they were in accomplishing their goals for the monaural, stereo and 3D simulations. Each subject’s rating and an overall average is displayed in a bar chart, see Figure 6‑21.



 


Figure 621: Chart showing Performance ratings for each subject and an overall average.

 


 

The average rating given by the subjects was 49.5 for mono, 56.5 for 3D and 64 for stereo. This points towards the subject being the most satisfied with the stereo result than with 3D or mono.

 

The t-tests carried out to show if the affect was caused by chance are shown in Figure 6‑22. The two t-tests comparing stereo against mono and 3D conclude that the subjects thought they were the most successful with stereo (t-values of 3.22 and 3.13 respectively and they had to be greater than 2.09). However, the t-test comparing mono to 3D enables us to conclude that the difference could have been caused by chance (the t-value was 1.63 and had to be greater than 2.09). Since the subjects had less practice with the monaural simulation and had to turn off more alarms this was expected. It is likely that the effect would be reduced if this wasn’t the case.

 


 


Figure 622: t-Test results for Mono, Stereo and 3D Performance ratings.

 

6.6.5.6      Frustration

 

Frustration refers to: “How insecure, discouraged, irritated, stressed and annoyed versus secure, gratified, content, relaxed and complacent did you feel during the task?” and has endpoints low and high [19].

 

Each subject selected a point somewhere between low and high to rate the amount of frustration they felt during the monaural, stereo and 3D simulations. Each subject’s rating and an overall average is displayed in a bar chart, see Figure 6‑23.

 

 



 


Figure 623: : Chart showing Frustration ratings for each subject and an overall average.


 

As can be seen in the chart, Figure 6‑23, subjects A and T were not frustrated at all for any of the simulations which is why they gave ratings of zero for all three modes. The average rating given by the subjects was 45 for mono, 37 for 3D and 32.5 for stereo. This points towards the subject being the most frustrated with mono then 3D and then stereo.

 

The t-tests carried out to show if the affect was caused by chance are shown in Figure 6‑24. The two t-tests comparing mono against stereo and 3D conclude that the subjects were the most frustrated with the mono experiment (t-values of 2.63 and 2.10 respectively and they had to be greater than 2.09). However, the t-test comparing stereo to 3D enables us to conclude that the difference could have been caused by chance (the t-value was 1.21 and had to be greater than 2.09). Since the subjects had to turn off twice as many alarms and had less practice it makes sense that they were more frustrated with the mono simulation. If this wasn’t the case it is likely that the mono simulation would be less frustrating.

 


 

 


Figure 624: t-Test results for Mono, Stereo and 3D Frustration ratings.

 

6.6.5.7      Overall Workload

 

The overall workload is calculated from the six factors that have just been covered. The overall workload calculated for each subject and an overall average is displayed in a bar chart, see Figure 6‑25.



 


Figure 625: Chart showing overall workload for each subject and an overall average.

 

 

 The average overall workloads was 531/3 for mono, 46 for 3D and 43.6 for stereo. This indicates the highest workload for mono, then 3D and then for stereo.

 

The t-tests carried out to show if the affect was caused by chance are shown in Figure 6‑26. The two t-tests comparing mono against stereo and 3D conclude that the subjects had the highest overall workload with the mono experiment (t-values of 5.57 and 5.12 respectively and they had to be greater than 2.09). However, the t-test comparing stereo to 3D enables us to conclude that the difference could have been caused by chance (the t-value was 1.39 and had to be greater than 2.09). As has been mentioned in each of the Nasa TLX categories, the mono simulation will have been given particularly unfair ratings because the subjects had to turn off twice as many alarms and had less practice with mono than the other modes. However, the substantial differences in results with the stereo and 3D simulations made it worth including them in this report.

 


 


Figure 626: t-Test results for the overall workload for Mono, Stereo and 3D.

 


6.6.6     Questionnaire

 

6.6.6.1      Subject being studied:

 

Subject

GROUP A

GROUP B

Computing Science

7

6

Software Engineering

1

2

Computing & Economics

1

0

Sports Medicine

0

1

Sports Science

1

0

Sports Nutrition

0

1

 

6.6.6.2      Instruments played by subjects:

 

Num of Instruments

GROUP A

GROUP B

Zero

7

7

One

0

1

Two

2

1

More than Two

0

1

 

6.6.6.3      Amount of game playing done by subjects:

 

Frequency

GROUP A

GROUP B

Never

0

2

Yearly

3

2

Monthly

5

3

Weekly

2

1

Daily

0

2

 


6.6.6.4      Negative aspects of the experiment stated by subjects:

 

Negative Aspect

Frequency

Mouse control is not very good - joystick would have been better.

4

You can react faster to some auditory alarms than others due to the position of important words in the speech samples

3

The spatial sounds weren't too good

3

Trial run is too long

3

Hard to concentrate on flying the plane while trying to listen to the alarms.

3

Possibility of losing the focus from the experiment window.

2

Tired Hand

2

Slight echo in spatial sound.

1

Two alarms were almost identical.

1

There was no link between the buttons on the screen and the layout on the keyboard.

1

The corresponding function keys to each alarm was hard to establish.

1

Feedback text area was distracting.

1

Mouse was too sensitive.

1

Mouse wasn’t sensitive enough.

1

The window was too small – should have been maximisable.

1

The buttons showing the available alarms were off putting.

1

Still feel there was an element of learning involved.

1

 

6.6.6.5      Positive aspects of the experiment stated by subjects:

 

Positive Aspect
Frequency
Stereo sound was helpful.
5
Realistic alarms.
3
3D sound was helpful.
2
Sky and ground colours are good.
2
Good fun
1
The recording of data is hidden.
1
The four buttons on each side match the four-button grouping on the keyboard.
1
No evaluator influences.
1
Good foreground activity, must concentrate a lot on foreground whilst trying to listen to commands.
1
Great experiment.
1
Response times being displayed is good.
1
The affect of learning was greatly reduced.
1
The mouse was responsive to mouse movements.
1

 

6.7      Discussion

 

6.7.1     Summary of results

 

Although the experiment was designed to compare stereo versus spatialised auditory alarms, the results for the mono simulations were also included. It was obvious that mono would do the worst because the subjects had less practice with mono than stereo and 3D. The NASA TLX scores were also biased in the favour of the stereo and spatialised simulations because the mono simulation required twice as many alarms to be turned off. However, the TLX scores given by the subjects and the opinions given in the questionnaire results permit concluding that the mono mode is less effective than it’s alternatives and shouldn’t be used when the stereo and 3D modes are available.

 

The conditions for comparing the stereo and spatialised modes were controlled and fair. The data collected from running the experiment on the twenty subjects was analysed using within groups, paired two sample for means t-tests and the majority of the results were not statistically significant. The stereo mode outperformed the spatialised mode in every respect but only two measures were statistically significant, namely the NASA TLX workload categories of mental demand and performance

 

6.7.2     What caused these results?

 

To sum it up, despite the fact that the auditory alarms were quite different, i.e. stereo versus spatialised, there was very little impact on the results. Particularly, the response times, performance scores and error rates, which aren’t subject to biases such as the classroom affect (the subject’s performance being affected by what is expected from them), did not have statistically significant results. Two possible explanations are that the technology for spatialising sounds is not sophisticated and realistic enough to have an impact. Another explanation could be that even if the spatialised alarms are realistic, it won’t have a significant affect anyway, i.e., even though the spatial location of a sound is perhaps the most significant factor in determining the similarity of multiple sounds [13], Bregman states that it does not necessarily ‘overpower other bases for grouping when in conflict with them’ [9] which is why humans can separate different voices speaking over a monaural speaker. Either way, the experimental results reveal that the impact of current technology is minimal.

 

6.7.3     Further research

 

Although it has been shown that the present technological capabilities for supporting 3D sound don’t improve response time, performance or error rate, there may be different advantages from using 3D sound. For example, the Federal Aviation Authority’s literature review of visual and auditory symbols state that ‘a good symbol is simple, unitary, identifiable, readily associated with the thing it represents, and makes appropriate use of metaphors’ [20] so there is more to a ‘good’ auditory symbol than just response time, performance and error rate.

 

An example of binaural alarms that makes use of metaphors are auditory alarms that sound on the side of the person that corresponds to the side of the plane with the problem, see Figure 6‑27. This way it may be less likely that the pilot tries to ‘fix’ the side of the plane which is actually intact. Additionally, rather than just having an alarm sounding, it could carry some information as to how serious the problem is. For example if the altitude drops to an unsafe level, the alarm could sound as if it is coming from behind or above the listener. As the altitude drops to a very dangerous level, the sound could ‘move’ closer until it is sounding very close to the pilot.

 

Morris and Leung stated that research should be carried out in the area of determining priorities when confronted with multiple warnings [4]. When more than one alarm is going off, spatialised alarms would provide a way of distinguishing which alarmed condition needs attention first. For example, if the fuel is just low enough to sound an alarm but the oil pressure is getting dangerously high, it would be convenient to highlight the severity of each with the position of the sound.


 


Figure 627: Alarms that carry an extra piece of information.

 

6.7.4     Traffic Alert and Collision Avoidance System (TCAS II)

 

As the name would suggest, the TCAS system is responsible for assisting the pilot to avoid collisions with surrounding aircraft. Currently the TCAS system uses both auditory and visual displays to convey the infor