The Use of 3D Audio
to Improve Auditory Cues in Aircraft
Supervisor: Prof. Chris Johnson
Second Reader: Dr John Patterson
Class: CS4H
Session: 1999/2000
Department of Computing Science,
University of Glasgow,
Lilybank Gardens,
Glasgow, G12 8QQ
Abstract
Auditory alarms are being used in many safety critical environments such as hospitals, nuclear power stations and aircraft. At present, these auditory alarms rarely make use of the fact that sound can be processed to come from more than one direction. Looking at aviation in particular, it is common for pilots to wear headphones that support stereo sound, which means that taking advantage of this aspect of audio is certainly feasible. Additionally, a 3D virtual acoustic display was proposed by Wenzel [1] when there was insufficient technology to test her assertions. However, with the release of DirectX 5.0, it is now practical to design and prototype auditory alarms that make use of 3D audio. This project investigates the impact of using spatialised alarms versus stereo and mono alarms. The effectiveness of the three types of alarm are analysed in terms of reaction time, error rate, learnability, performance of primary task and workload measures. The results from this experiment indicate that the technology for supporting 3D audio is not sufficient to yield an advantage over it’s alternatives.
Acknowledgements
I would like to thank,
Prof. Chris Johnson for being my supervisor and guiding the course of my project.
Dr John Patterson for being my second reader and supplying helpful feedback.
Dr Ashley Walker for giving advice and access to spatialising equipment.
Dr Stephen Brewster for advice on experimental practice and statistical analysis.
Mark Rodgers for highlighting relevant studies.
Paul Krois for supplying FAA documents.
Adrian Ng Han Boon for spatialising the auditory cues.
The twenty subjects that participated in the experiment.
TABLE OF CONTENTS
2 Different ways
of delivering audio
2.5.1 Cues that aid in human sound
localisation
2.5.1.1 Interaural Time Difference (ITD)
2.5.1.2 Interaural Intensity Difference
(IID)
2.5.1.7 Early echo response/reverberation
2.5.2 Head Response Transfer Function
(HRTF)
3.1 Advantages
Of Using Auditory Warnings
3.2 Disadvantages
of Using Auditory Warnings
3.3 Attention
In The Auditory Modality
3.5 Auditory
Warnings In Aircraft
3.5.3 Speech And Nonspeech Combinations
4 DirectX 5.0 -
DirectSound (Spatialising Sounds)
4.1 3D
Coordinates and Distance
4.2 Point
Sources and Sound Cones
5.5.1 Artifical Horizon System
6 Spatialised
Auditory Warnings Experiment
6.3.1 Independent variable (IV)
6.3.2 Dependent variables (DV)
6.6.6.1 Subject being studied:
6.6.6.2 Instruments played by subjects:
6.6.6.3 Amount of game playing done by
subjects:
6.6.6.4 Negative aspects of the experiment
stated by subjects:
6.6.6.5 Positive aspects of the experiment
stated by subjects:
6.7.2 What caused these results?
6.7.4 Traffic Alert and Collision
Avoidance System (TCAS II)
The development of auditory warnings has been going on for a long time and the amount of knowledge in the area has become very rich. Research has been carried out to investigate the affects that various aspects of auditory warnings has on the user, e.g. [2], [3] and [4]. Some of the aspects that sound has which can be altered when designing auditory warnings include pitch, volume, timbre, attack, release, decay, amplitude, frequency and reverberation. Each of these may or may not have a significant effect on the effectiveness of auditory warnings.
An example of research carried out to investigate the effectiveness of different aspects of sound include that done by Haas and Edworthy. They performed an experiment to test the effects of pitch, speed and loudness of auditory alarms on perceived urgency and response time [2]. Thirty participants had to turn off alarms (with the IV’s being varied) and rate the perceived urgency. The results of this experiment allowed Haas and Edworthy to conclude that the perceived urgency increases and the response times decreases with auditory alarms which have a high frequency, a fast speed and a high level of loudness.
Additionally, Melara and Marks [3] found that listeners respond faster to a loud sound if the pitch is also high rather than low (despite instructions to ignore the pitch). Similarly, the listener would respond quicker to soft sounds that are low in pitch, rather than high in pitch.
Further research includes that done by Morris and Leung [4] who showed that voice warnings yield better reaction times and learning times over earcons and hybrid warnings. They also concluded that hybrid warnings have improved error rates over voice and earcon warnings.
In spite of the rich amount of research carried out in this area, auditory alarms are still not satisfactory. In fact as Stanton and Edworthy [5] highlighted, there is a considerable amount of discontent in the way that auditory alarms are designed. Morris and Leung [4] feel that investigating auditory alarms is an important area because the development and use of auditory alarms has surpassed the amount of research regarding how they should be designed. The aim of this project is to investigate the use of spatialised sound in designing auditory alarms. This may improve the reaction time, error rate and learnability of auditory alarms. Therefore, if a helpful contribution can be made to the current research, perhaps better alarms will be designed in the future.
In 1991, Wenzel proposed the use of a three-dimensional virtual acoustic display but only addressed the human perceptual requirements and stated that technological capabilities and constraints should be addressed later [1]. Her proposed auditory display had to be capable of ‘presenting information in three spatial dimensions’ and ‘representing multiple sources which can be either static or moving’. With the release of DirectX 5.0 (Section 4), which supports this, it is now possible to validate some of her assertions about the utility of these techniques.
With the understanding of how humans can recognise the directionality of sound and the recent development of technology that can realistically reproduce these affects with headphones or speakers, it is important to investigate the use of spatialising auditory alarms.
At present, the majority of auditory alarms are monaural, which means that from the listener’s point of view, all the alarms come from the same location. This is done despite the fact that pilots often wear headphones which gives auditory alarms another method to convey information, i.e. by which channel is used. This can be taken a step further than just allowing an auditory alarm to come from the left or right, by sounding alarms from many more directions, the technical details of this are discussed in Chapter 4. Theoretically, if more information is conveyed in the auditory alarms, less pressure will be placed on the listener to handle both the primary task and the alarms going off. This is the hypothesis which will be tested in this experiment.
The main aim of this report is to investigate the use of mono, stereo and spatialised auditory alarms, particularly comparing stereo to spatialised alarms. Factors that will be measured to establish the advantages of each sound mode include reaction time, error rate and learnability.
There are many different ways that audio can be played to the listener. My experiment used mono, stereo and positional 3D-audio. I will discuss the characteristics of the various sound modes below, mono, stereo, quadraphonic sound, surround-sound and positional 3D-audio.
With stereo, quadraphonic and surround-sound modes there are important implications depending on where the position of the listener’s head is (except when headphones are used for stereo). If the listener moves closer to one speaker than the rest, the origin of the sound will move accordingly. Since they are sensitive to head movement, these modes can only be used in certain situations.
Positional 3D audio can use head tracking technology to address this problem. Head tracking support requires detecting the location and direction of the listener’s head who must be wearing headphones. It is then necessary to dynamically transform the sounds so the perceived location remains constant as the listener moves their head.
Figure 2‑1 shows how mono sound is reproduced [6]. The sound will originate from a single location and the direction will remain fixed. Mono sound is compatible with headphones by playing the sound at equal volume in both ears.

Figure 2‑1. Mono sound.
Figure 2‑2 shows how stereo sound is reproduced [6]. The apparent direction of the sound being played can be varied to originate anywhere on the line. This is achieved by altering the volume of the sound in each speaker. Stereo sound is also compatible with headphones by altering the volume of the sound in each ear.

Figure 2‑2. Stereo sound.
This works on the same principle as stereo sound but there are four speakers instead of two so that the directionality of the sound can be extended, see Figure 2‑3. The apparent direction of the sound being played can be varied to originate anywhere on the bounding box around the listener. Again, this is achieved by altering the volume of the sound in each speaker. Because it is necessary to have more than two speakers this mode is not compatible with conventional headphones.

Figure 2‑3. Quadraphonic sound.
A quadrahphonic
set-up with an additional speaker positioned close to the listener can extend
the range of possible origins for sounds to anywhere inside the bounding box,
see Figure
2‑4. Thus, the sound source can originate from any
direction, on one plane, around the listener. This principle can be extended by
adding more speakers so that the origin of a sound can also be below or above
the listener. Again, because it is necessary to have more than two speakers
this mode is not compatible with conventional headphones.

Figure 2‑4. Surround sound.
Computers can be
used to simulate what happens to a sound as it travels from the source to the
listener’s ears by analysing what happens in the real world. Before covering
how a computer performs this operation it is worth analysing those cues that
help us to localise sounds.
Humans take for
granted their ability to locate the origin of a sound source with high
precision. There has been a lot of research in the area about what cues enable
us to do this. Nine cues have been widely accepted, most of which are
highlighted by Tonnesen and Steinmetz [7].
These are interaural time difference, interaural intensity difference, head
shadow, pinna response, shoulder echo, head motion, early echo response,
reverberation and vision.
As shown in Figure 2‑5, the distance that the sound has to travel can be different for each ear. The length of b is greater than the length of a, which means there will be a slight delay in the sound reaching the right ear. Obviously the ITD for sounds which have an origin directly in front or behind the listener will be zero but for sounds positioned to the far left or right will have an ITD of about 0.63 milliseconds [7]. This is an important cue for the brain to interpret the lateral position of a sound.

Figure 2‑5. The distance
for the sound to travel to each ear is different.
Not only is there a delay because of ear position, but there is there a difference in the volume of the sound reaching either ear called the Interaural Intensity Difference (IID) which is a primary localisation cue, see Figure 2‑6 [8]. The main reasons for this is because the sound has less distance to travel to one ear than the other and the head gets in the way (head shadow cue).

Figure 2‑6. Illustration of Interaural Intensity Difference
This refers to
what happens when sound has to travel through or around the head to reach the
other ear. If the source of the sound is located to the left or right of the
listener then the head will get in the way more than if the source is in front
or behind. With the head being in the way, the sound will have a lower
amplitude. Additionally, the head acts as a filter which alters the original
wave making it difficult to detect the direction and distance [7].
The pinna, see Figure 2‑7 (from [6]), has a signifcant affect on sounds before they
reach the eardrum. The pinna acts as a filter on the high frequencies of sound
which assists with the perception of lateral position and elevation. Tonnesen
and Steinmetz state that the affect the pinna filter has ‘is highly dependent
on the overall direction of the sound source’ [7].

Figure 2‑7. The Outer Ear.
The affect that the pinna or outer ear structure has on sound is illustrated in Figure 2‑8, [8]. As shown in the diagram, the medium to high frequency end of the spectrum are affected greatly by the pinna. This filtering depends on the angle at which the sound wave reaches the pinna. Each ear will filter the sound differently giving the brain a hint to the location of the source.

Figure 2‑8. Spectrum differences between original sound and the sound after the pinna has taken affect.
As pointed out
by Tonnesen and Steinmetz [7], sounds that have a frequency of the range 1–3 kHz
and come from particular directions are reflected by the upper torso of the
human body, see Figure
2‑9. The elevation of the source affects the length of
the time delay caused by the reflection with the shoulders. This provides
information about the direction of the sound but it is not a primary auditory
cue.

Figure 2‑9. Sound being reflected by the upper torso.
A primary cue in locating the source of a sound is the motion of the listener’s head. It is natural for human beings to move the head to track where a sound is coming from. As the frequency of the sound increases, so does the amount of head movement to compensate for that fact that high frequency sounds don’t bend around objects very easily and are harder to localize [7].
The path that a sound wave travels from the source to the listener is not simply a direct line, see Figure 2‑10. Instead, the waves will be reflected off the surfaces in the real world and arrive at the listeners ears at different times causing reverberation.

Figure 2‑10. Effect of
sound being reflected by surfaces.
Early echo
responses occur in the first 50-100ms of a sounds life. These early echo
responses and reverberation give clues to the listener about the direction and
distance of the sound source. For example, if the source of a sound is close to
the listener there will be less echoes than if the source of the sound is far
away.
Vision is often
a major factor in identifying the location of a sound. For example, when you
hear a barking noise and you can see a dog in the rough direction it came from
it is obvious where the source of the sound is.
Previous
sections have described how the human perceptual system uses various cues to
localise sounds in the real world but how does a computer duplicate these cues
to fool our brain? The answer lies in the Head Response Transfer Function (HRTF).
An HRTF performs a mathematical transformation of the spectrum of a sound to
simulate the affects of the cues mentioned earlier. To allow a computer to
spatialise a sound it uses the relative angle (and distance) of the sound from
the listener to select an HRTF to simulate that affect. There is one HRTF for
each direction and distance from the listener, e.g. the effect of ten metres
directly in front of the listener would encompass one HRTF. A system typically
uses about one thousand HRTFs to cover a sufficient number of directions and
distances around the listener’s head to simulate a full 3D space.
Each of these
HRTFs are still built using the same prinicple as W. Bartlett in 1927 [6]. Microphones are placed in the auditory canals of a
dummy and sounds that are played from a fixed location are analysed to
determine the affect made on each of the frequencies. This data is used to
build a filter that mimics these affects which can be applied to a monaural sound
whenever you want it to appear as if it were coming from that location.
The different modes for delivering audio have been highlighted, from mono which aircraft cockpits currently use, to modes which require an arrangement of speakers such as quadraphonic sound or surround sound to the more recent approach of positional 3D audio. The cues that a HRTF (Head Response Transfer Function) uses to try fool the listener such as interaural time difference, pinna response or reverberation have been described. With this project investigating whether there are advantages to using positional 3D audio for auditory cues in aircraft, the necessary background information about delivering audio has been given.
A greatly
simplified description of perception was given by many philosophers and
psychologists since the times of Aristotle as ‘the process of using the
information provided by our senses to form mental representations of the world
around us’ [9].
Similarly, a dictionary may describe it as the ‘recognition and interpretation
of sensory stimuli based chiefly on memory’. Obviously hearing is a critical
sense in many situations and it has many advantages over other senses, see 3.1, but also has it’s share of disadvantages, see 3.2. The use of auditory warnings has obviously been
exploited for a long time and it’s use has been varied over time from the
whistle on a train to speech warnings on a modern aeroplane. Speech, nonspeech
and combined warnings will be covered in section 3.5.
The auditory
modality varies from the visual modality in two main ways. Firstly, hearing is
omnidirectional which means it can take input from any direction unlike vision
which requires you to be focusing on a specific area. Secondly, auditory
information is often transient, i.e. you hear a word or tone and then it ends
whereas visual input doesn't normally disappear. The fact that the short-term
auditory store is longer than the short-term visual store helps to address this
problem.
When a person is
not focusing on a stream of auditory input, the sound remains in the
preattentive short-term auditory memory for three to six seconds. If you are
listening to someone and your attention wanders, it is possible to switch your
attention back and 'hear' the last few words you weren't actually listening to.
If you don't switch your attention back then the auditory information is lost.
However, if the 'background' sound is sufficiently pertinent, your adaptive
mechanisms will bring this material to the focus of attention which is the
theory behind auditory warnings.
In the same way
that you can pay attention to both the words and melody of a song you can
capitalise on parallel processing in the use of auditory warnings. In the
auditory modality, the ability to focus attention on one channel is disrupted
when there is increased similarity with competing messages that are to be
ignored.
Spatial location
or directionality is perhaps one of the most important dimensions of similarity
[13]. As two auditory messages become closer in space,
our ability to process them declines. The following bullet points highlight
areas that can affect this similarity.
Having reviewed
perceptual aspects of auditory warnings, this section goes on to investigate their
use in aviation. The amount of instrumentation required in modern aircraft has
reached a point where the panel space is very cluttered, see Figure 3‑1 for the cockpit of a Beechcraft King Air A90. Since
there is a limited amount of information that a pilot can scan at one time it
is an important area.

Figure 3‑1: The cockpit
of a Beechcraft King Air A90 with cluttered panel space.
Visual
cluttering has been a problem in aircraft. For this reason there are great
advantages in replacing some tasks with other sensory channels such as hearing.
However, traditional visual instrumentation has been replaced with bells,
beepers and electronic tones to a point where auditory clutter is almost as bad
as visual clutter.
At present
aircraft use many nonspeech auditory displays such as bells, whistles, horns,
buzzers, clackers and various electronic tones which can vary in intensity,
pitch and duration. It is possible for pilots to learn at least ten nonspeech
auditory signals although they forget their meanings over time [14]. Forgetting the meanings may be partly due to the
fact that there is very little standardisation between aircraft, i.e. fighter
aircraft F-4D, F-15 and F-16 and transport aircraft C-5 and C-141 all use
nonspeech signals but all with different standards [15]. Doll and Folds [15] have pointed out that confusion is being caused by
nonspeech sounds which are too similar. For example the fighter aircraft F-16
sounds alarms for the ground proximity warning and another for the
angle-of-attack warning. One warning indicates a need to raise the nose and the
other to lower the nose yet they both use an 800 Hz tone. Because of this
confusion it is important to focus on signal distinctiveness and 'masking
resistance'.
These reduce
recognition problems for non-speech audio and are more suited to certain
applications than nonspeech displays.
The main
applications for speech displays are [16]:
Further
possibilities include [16]:
·
Commands
Despite these
many possible applications it can be argued that speech displays should just be
used for warnings. One point is that pilots consider speech displays to be
noisy, strident and intrusive [16]. However, a more important point is that if speech
displays are used for both advisories and warnings, the latter may be treated
less urgently. Wickens points out that pilots preferred speech messages for
conveying warnings but preferred visual displays for other information [16].
Wickens points
out that pilots respond to voice warnings faster than warnings presented
visually [16]. Speech technology has been used in aircraft warning
displays such as the following:
Designers can
exploit many variables when developing speech displays:
Speech
generation
Contextual
factors
Linguistic
factors
There are
arguments on what is reasonable for each of these variables to use speech
displays to their full potential. For example, it is argued that speech
displays should not be similar to human speech so that pilot's can distinguish
the messages from those given by people at air traffic control [16]. Additionally the speech rate should not be as slow
as 123 wpm which is irritating and time consuming but not as high as 178 wpm [16].
As Wickens
highlights, including an alerting tone before a voice warning message reduces
response times [16]. This is because when an alarm goes off the pilot's
attention isn't shifted quickly enough to apprehend the beginning of the
message. Despite this combination proving to be better than a speech warning it
has been shown that starting voice messages with the word ‘Attention’, ‘Danger’
or ‘Advisory’ had the same affect [16].
As would be expected, there are both advantages and disadvantages to using auditory warnings. For example, on one hand, the omni directional nature of hearing allows auditory alarms to be detected no matter where the pilot is looking, but on the other hand, if more than one alarm sounds at the same time the pilot may not be able to identify either. Additionally, it is necessary to consider the different approaches to using auditory alarms such as speech displays, non-speech displays or combined and the impact that each would have if used. However, it is clear from this chapter that if attention is paid to the wealth of research in this area there can be many benefits to using auditory alarms.
The sounds that I used in my experiment were originally monaural so it was necessary for me to spatialise them for my experiment, see 6.2. In order to do this I needed a PC with a suitable soundcard and the appropriate software. The hardware included a standard PC and a Sound Blaster Live sound card. The software for spatialising sounds was written by Colin Paterson, in the Computing Science department in C++ which uses DirectX 7.0.
With the release of DirectX 5.0 which has added features in the DirectSound API (Application Programming Interface), it is possible to add spatialised sound (or 3D sound) to your applications. DirectSound attempts to duplicate the affects of cues which allow humans to locate sounds as realistically possible by using a HRTF (see 2.5.2). To sum it up very briefly, the brain can estimate the spatial location of a sound from various cues such as nuances in the way that sound reaches one ear before the other and ‘slight echoes and reverberations in the surrounding environment’ [7].
The DirectSound API is based around the concept of a virtual 3D space where sounds can be positioned with x, y and z coordinates. Obviously the listener is also placed in this 3D space but with an orientation to indicate the direction that they are facing. Once the programmer has placed the sound sources and the listener, DirectSound has the basic information that is necessary to spatialise the sounds.
DirectSound uses a left-handed coordinate system, see Figure 4‑1, [17].

Figure 4‑1: Left-handed Coordinate System
The unit of measurement is the metre (although it can be changed) which means that if you place a listener at the origin and a sound at (0, 0, 10), DirectSound will try to simulate the sound as being ten metres away in the real world.
Since all sample sounds have much the same volume, a problem arises when you move different sound sources away from the listener. To illustrate, the unadjusted volume of a sampled sneeze will be close to the volume of a sampled explosion [17], which means that if you move them both to 100 metres from the listener they will both sound just as loud. Obviously this isn’t realistic because you would not hear the sneeze but you would certainly hear the explosion. To address this, DirectSound allows the programmer to set the distances at which the sounds are at their minimum and maximum volumes which allows the muting affect to vary for different sounds. For example, the sneeze sound could be set to have a range of half a metre (maximum volume) to fifty metres (minimum volume) and the explosion, 100 metres (maximum) to 10,000 metres (minimum). DirectSound will use this range to scale the volume in a linear fashion depending on the distance from the listener.
DirectSound has ‘built in’ support for two different types of sound. These are point source sounds which emit waves in all directions and sound cones which only emit waves in directions which are constrained by a cone (like a loudspeaker). This is illustrated in the following diagram [17].

Figure 4‑2: Point Sources and Sound Cones
An orientation, along with two angles are specified for each sound source to give the areas A, B and C as shown in Figure 4‑2. A listener in area A would hear nothing; in B would hear the sound at full volume (scaled according to the distance from the sound source). Listener C would hear the sound scaled in accordance with the distance from the inner cone and, of course, the sound source.
As mentioned earlier, it is the listener’s position and orientation relative to the sound sources that affects what the environment will sound like. Setting the location of the listener obviously requires one set of coordinates. However, to set the orientation requires two vectors which are at right angles to each other and intersect at the centre of the virtual head, see Figure 4‑3 [17].

Figure 4‑3: Listener Orientation
As can be seen, the top vector points up from the virtual head and the front vector points forwards. This provides enough information for DirectSound to set the orientation of the listener’s ears.
An obvious nuance that affects how a sound should be played to the listener arise from three possible modes [17]:
A well-known feature of DirectSound is that of simulating the Doppler effect. As mentioned earlier, each sound source can have a location and a direction but a velocity can also be specified. With this information DirectSound is able to simulate the Doppler effect by adjusting the pitch of the sound. A common example of the Doppler effect is when an emergency vehicle drives past with a siren blaring. The pitch increases as the siren moves towards the listener and decreases as the vehicle moves further away.
If the programmer specifies the hardware attached to the system, then DirectX can take advantage of it by using the hardware to perform operations rather than having DirectSound emulating how it is done. Additionally, the hardware attached affects how the application being developed should be scaled. For example, when there is no hardware acceleration available then the developer should be cautious about how many 3D sounds are incorporated. For example, the Lake Digital Sound Processor (DSP) can render multiple sounds with Doppler effects, directivity and support for room acoustics in real time [18].
DirectX 5.0 provides a host of features which give a software developer the necessary tools to produce powerful multimedia applications. As highlighted, the part that gives support for sound is called DirectSound which is vastly superior to it’s previous version. The extensive support for 3D sound allows another dimension to be added to multimedia applications. The software used in this project only touched upon the basic capabilities of DirectSound. However, it is clear that DirectX 5.0 is a flexible platform which can be utilised in the design and prototyping of 3D auditory alarms.
The construction of my experiment required implementing a primary task for the user to concentrate on, while a secondary task of turning off alarms interrupts them. As mentioned in section 6.3, it is the performance of the user with the primary task, the response time to the alarms and the error rate which are of interest. It would have been possible to use an existing flight simulator (or other task which requires continuous interaction) as the primary task and a custom-built secondary task, which sounds alarms and records response times and error rates. However, it would have been difficult to compare the performance of the user in the primary task under different alarm conditions. For this reason I chose to implement a primary task that requires continuous interaction with the user and that records the performance. It is an artificial horizon simulator which requires the user to keep the horizon as level as possible in the face of occasional disturbances. The performance of a user is expressed as a percentage which reflects the amount of time that the horizon was level enough to have the circle overlapping with the crosshair.
I chose to implement my experiment in Java using applets so the experiment could be run over the internet. This way it would be easier to get subjects to participate in the experiment and it is easier for others to replicate my experimental method.
The implementation of the experiment was divided into various subsystems including an alarm handler, performance tracker, flight controller and a turbulence simulator. I have divided these into two categories, 'statistical data' which captures the data necessary to analyse the different sound modes and 'artificial horizon' which is concerned with providing the primary task of keeping a plane level, see Figure 5‑1. The components are described in more detail below:
Figure 5‑1: Hierarchical Design of the Software for the Experiment
I chose to
implement my experiment in Java so that it could be run over the internet.
However, my experiment records data about the user’s interaction with the
system under different circumstances. Because of the security constraints that
Java imposes to applets running over the web, this issue was more complicated
than at first suspected. Originally I was hoping that the applet could simply
e-mail me the results, but the restrictions that Java enforces made this
impossible. Therefore the only other option is to record the data in a file
which the subject then transfers over.
The default
permissions of applets being executed over the internet restricts the actions
that can be carried out to prevent malicious attacks. Because of this, the
applet for my experiment is unable to write the results to a file in my
workspace unless special permissions are granted by the user. Java’s policy
tool allows users to grant these essential permissions to my trusted applet.
Each user has a ‘.java.policy’ file
which contains a list of policies that the user sets up. Each of these policies
contains a list of permissions that have been granted to various class files
distributed over the internet. When an applet is being executed, the Java
run-time environment checks this policy file against each instruction for
conflicts. If the policy file forbids an instruction to be executed then an
exception is raised by the security manager. The user that wishes to
participate in my experiment grants the permissions using the policy tool, see Figure 5‑2.

Figure 5‑2: Java Policy
Tool for granting permissions to applets over the Web.
The Policy Tool
automatically loads the user’s default policy file (if one exists) which can be
modified using this tool. The user can add a set of permissions to applets that
run over the web by using the ‘Add Policy Entry’ feature, see Figure 5‑3.

Figure 5‑3: Add Policy
Entry screen.
The user then
has to add individual permissions, such as granting file writing permissions to
any applets in my experiment web page, using the ‘Add Permission’ screen, see Figure 5‑4.

Figure 5‑4: Use this to
add individual permissions to applets.
If the user
grants write permissions to my applet, then my experiment will be able to
record data in a file in my workspace necessary for analysis. However, every
subject was unlikely to go through the many screens giving permissions to my
applet. Therefore, I designed an alternative method of granting permissions:
They run a batch program which is
available from the experiment web page.
If the user
already contains a ‘.java.policy’ file in their profile folder then this will overwrite
it, in which case they may want to use the policy tool to grant the permissions
themselves. Obviously they will then have to run a different batch file on the
experiment web page which doesn’t copy this file. Since it would be very
surprising if any students participating in the experiment already had a
.java.policy in their profile folder, this solution proved to be very
effective.
This section
presents a summary of the artifical horizon system and it’s four different
subsystems. These modules are the Alarm Handler, Performance Tracker,
Turbulence Simulator and Flight Controller.
Along with coordinating the efforts of it’s subsystems it is responsible for repainting the window contents of the artificial horizon simulation.
The ‘paint’ method continuously repaints the contents of the applet window which is how the simulation produces the animated affect of flying a plane. The location of the circle, which the user is trying to keep over the crosshair is simply provided as an aid to the user to keep the plane level. The direction that the plane is heading determines the appearance of the sky and ground and the circle will help to explain this relationship.
The vertical direction of the plane can be summarised as follows. If the circle is above the crosshair then the plane is heading upwards towards the sky so you would see less ground and more sky, see Figure 5‑5. Similarly if the circle is below the crosshair then the plane is heading downwards towards the ground so you would see more ground and less sky, see Figure 5‑5.

Figure 5‑5: Plane heading towards the sky firstly,
and towards the ground secondly.
The horizontal
direction (heading to the left or right) of the plane can be summarised as
follows. If the circle is to left of the crosshair then the plane is heading to
the right, see Figure
5‑6. Similarly, if the circle is to the right of the
crosshair then the plane is heading to the left, see Figure 5‑6.

Figure 5‑6: Plane heading to right firstly,
and heading to the left secondly.
As Figure 5‑7 shows, the affects of vertical and horizontal
movement can be combined to provide a fairly realistic model of what happens
when flying a plane.

Figure 5‑7: Combined affects of vertical and
horizontal movement.
This relationship between the circle and the crosshair is mapped mathematically in the ‘paint’ method to draw two filled polygons for the ground and sky.
As described in 6.2, the user has to fly the plane as level as possible while having to disable alarms that are going off. This is done by pressing a button (F1- F8) that is associated with the alarm. Additionally, as stated in 1.1 the response time for each alarm and the error rate for each simulation must be recorded for data analysis. As shown in 5.3, it is the alarm handler that is responsible for these tasks. The class ‘Alarm_Handler’ was implemented to encapsulate these responsibilities. Figure 5‑8 shows the ‘sound_alarm’ method which is called by the Artificial Horizon System when an alarm is to be sounded. It records which alarm was set off and at what time it started to sound. When the user tries to turn off an alarm (or ignore the alarm) it will be able to tell if the user made an error and time how long it took.

Figure 5‑8: Alarm_Handler method for sounding alarms.
Figure 5‑10
shows the ‘turnoff_alarm’ method which handles the user requests to
turn off alarms. Please note that all feedback added to the text area in the
simulation is also written to the results file in the ‘paint’
method, which is why there are no File IO operations in this method. When the
user presses a key, a request is made to the alarm handler to turn off the
corresponding alarm, see
Figure 5‑9.

Figure 5‑9: Excerpt of code for handling key presses that turn off alarms.
When the alarm handler receives this request the time is recorded (‘off_time = System.currentTimeMillis();’) so that by subtracting the time the alarm was sounded will give the response time. Additionally, each alarm that the user turns off is compared with the alarm that was actually sounded to detect any errors; this is also recorded in the results file. One last responsiblility given to this class is to suspend the simulation if the last alarm is turned off.

Figure 5‑10: Alarm_Handler method for handling user requests to turn off alarms.
It
was necessary to measure the performance of the user operating the simulator
under different conditions, i.e. monaural, stereo and spatialised alarms, to
analyse which is the best. Since the aim of the user operating the simulation
is to keep the plane as level as possible, it was decided to measure the
performance of the user by monitoring how much time the cirlce is overlapping
with the cross-hair. Figure
5‑11 gives an example of the simulation running with the
circle overlapping the crosshair and hence the plane is relatively level; Figure 5‑12 gives an example of the simulation running with the
circle away from the crosshair and hence the plane is not very level.

Figure 5‑11: Simulation
running with the circle overlapping with the crosshair.

Figure 5‑12: Simulation
running with the circle NOT overlapping with the crosshair.
A thread was
implemented called ‘measurePerformance’, which is schedueld to
interrupt the processor every 50 milliseconds to see if the circle is overlapping
with the crosshair, see Figure
5‑13. Although the ‘measurePerformance’
thread is scheduled to use the processor every 50 milliseconds it may be
delayed for slightly longer. However, the effect of this would have a
negligible impact on the result. By regularly interrupting the processor and
keeping track of the number of times the circle was overlapping with the
crosshair and the number of times it wasn’t, you can accurately judge the
performance of the user. The code in Figure
5‑13 does exactly this, with ‘in’ representing
the number of times the plane was relatively level and ‘out’
representing the number of times the plane was not very level.

Figure 5‑13: Excerpt of code for measuring the
performance of the user.
To
express the performance of the user in a meaningful way it would make sense to
express the performance of the user as a percentage. Figure 5‑14 shows a method which does exactly this, with the
value being returned representing the percentage of time that the plane was
relatively level.

Figure 5‑14: Excerpt of code for returning the
percentage score of the user.
The responsibility of this subsystem is to provide the affect of turbulence to the user trying to control the plane. The user is trying to keep the horizon level but disturbances are constantly being produced by the turbulence simulator so that the horizon will rarely remain level. Obviously the user has got to counterbalance these movements by dragging the mouse.
To simulate this affect, two threads were implemented to adjust the direction of the plane; one thread would adjust the horizontal direction the plane was heading and the other would affect the vertical direction. Each thread interrupts the processor every 50 milliseconds and makes a slight adjustment to the direction. The decision to set the sleep time to 50 milliseconds was made so that the simulation was smooth while not placing too much demand on the processor (twenty interruptions a second by each thread won’t place too much strain on the CPU).
The amount of change made to the direction of the plane varies to give the appearance that the turbulence is random. Each time one of these threads interrupts the processor, they would adjust the direction of the plane by a value in the range of –10 to +10 pixels determined by looking up the next value in an array. Each thread had an array of approximately 1,000 integers in this range chosen to give smooth changes in direction, i.e. rather than altering the horizontal direction with a jump by going from –10 (10 pixels to the left) to +10 (10 pixels to the right) the transition would be –10, -9, … 9, 10.
Because the adjustment made to the horizontal and vertical direction is pre-decided, the turbulence will be exactly the same on each run of the simulation. It was considered to make the turbulence completely random (by using a random number generator) but because of the user’s interaction with the flight path the simulation affectively becomes random. Additionally, if the turbulence was generated randomly, a subject may be lucky on one trial and get a flight that is relatively stable and unlucky on another trial which is hard to control having the affect of skewing the results. Furthermore, if both of the threads are having to generate random numbers then the simulation would be slowed down.
The purpose of the flight controller subsystem is to translate the user’s mouse movements to control the flight of the plane. It was decided to implement the controls of the plane the same as any other flight simulator as follows:
In Figure 5‑15 the dashed lines overlaid on the simulator screen show the affect on the circle of dragging the mouse (with the left mouse button pressed) in a south easterly direction. Obviously the horizon will become more level if the circle is moved closer to the crosshair.

Figure 5‑15: Simulator screen with arrows indicating the effect of dragging.
A mouse listener
and a mouse motion listener obviously had to be implemented to implement this
part of the simulator. The mouse listener keeps note of whether the left mouse
button is being held down or not and if it is, would also record the
coordinates of where the mouse was clicked. Subsequently, the mouse motion
listener would redraw the circle whenever the user dragged the mouse – the
horizontal and vertical distance being determined by the coordinates of the
mouse pointer in relation to the previous location of the circle when the
screen was repainted. Figure
5‑16 shows the code for listening to the mouse being
dragged and Figure 5‑17 shows the code that updates the coordinates of
the circle to be drawn (‘new_x’ and ‘new_y’). The call
to ‘repaint’
calls the ‘paint’ method which uses the coordinates ‘new_x’
and ‘new_y’,
to draw the new circle, ground and sky.

Figure 5‑16: Mouse motion listener to redraw
the circle when the user drags the mouse.

Figure 5‑17: Method to update the coordinates of the new circle
to be drawn.
Each subsystem was individually tested to ensure that they met their requirements. When it was clear that each worked correctly and in a robust fashion the subsystems were progressively combined and tested. When the complete system was formed, the artificial horizon simulator was again thoroughly tested for correctness and robustness. Before running the experiment various users operated the system to ensure there were no errors present in the final version.
Each individual subsystem was implemented and these were successfully combined to produce the artificial horizon simulation. The turbulence simulator provided the desired effect of disturbances; the alarm handler correctly set off alarms, accepted the user’s requests to turn off alarms, recorded response times and any errors; the performance tracker correctly recorded the performance of the user operating the system and the flight controller provided a responsive way for the user to navigate the horizon with mouse movements. Additionally, the system had three modes of alarm, mono, stereo and spatialised.
Twenty subjects were used in this experiment, all of which were students from Glasgow University. Subjects completed a brief questionnaire before beginning the experiment which highlighted that they had varying amounts of computer game experience which could affect their ability to control the simulator (the average being monthly). Additionally, a large portion of the participants studied either Computer Science or Software Engineering (85%) which would indicate they had an extensive amount of experience with computer-based tasks. The majority of the subjects didn’t play instruments (70%) but it may be true that subjects that play instruments have an increased ability to distinguish alarms. The subjects that studied computing-related subjects or played instruments were fairly distributed between the two groups, see 6.3.3.
The purpose of the experiment is to investigate the use of spatialised sounds for auditory warnings. Users had a primary task they had to concentrate on while concurrently having to deal with any alarms that go off. The primary task for the experiment was controlling an artificial horizon simulator, see Figure 6‑1. The secondary task was turning off the alarms as they sounded by pressing the corresponding function key (F1 – F8).
The user had to ensure that the circle was over the cross hair to stabilise the horizon. The position of the plane and horizon changed periodically so the user had to use the mouse to control the flight as well as possible. The direction of the “plane” would constantly drift off in various directions requiring constant interaction by the user to correct the flight path.
The alarms that go off in the simulation were obtained from the Federal Aviation Authority’s web-site. They are actual air traffic control messages that had been recorded. The eight alarms that are used in the simulation are as follows:
The user had to wear headphones to hear the alarms because there are three different modes for the simulator which are monaural, stereo and spatialised. In monaural mode, the alarm would sound with equal volume in each ear; in stereo mode the alarm would sound at full volume in the left or right ear (and silent in the other ear) indicating the side of the simulator that the alarm originated from and in spatialised mode the directionality of the alarm would indicate which of the eight directions that the alarm originated from.

Figure 6‑1: Artificial Horizon Simulator
For the monaural simulation, the volume of the auditory cue is the same in each ear, see Figure 6‑2. Therefore, the spatial location is the same for each auditory alarm which gives no extra clues to the listener to identify which alarm needs turning off.

Figure 6‑2: How alarms are played for the monaural simulations.
For the stereo simulation, the auditory alarm plays at full volume in one ear and is silent in the other depending on the side of the screen that the alarm originated, see Figure 6‑3. For example, if the alarm is one of the four on the left then the alarm would play at full volume in the left ear and would be silent in the right ear.

Figure 6‑3: How alarms are played for the stereo simulations.
For the spatialised experiment the location of the auditory alarms were arranged in a circle around the listener. With eight alarms the angle between each direction is forty five degrees. As shown in Figure 6‑4, if the alarm is situated on the left hand side of the screen the direction of the alarm will also be more from the left and vice versa. However, the spatial location of the auditory alarms has been extended to give more of a hint than just which side of the simulator the alarm originated. For example, if the alarm sounding is situated in the top left hand corner of the screen, then the spatial location of the alarm will be in front and to the left of the listener. Similarly if the alarm sounding is situated in the bottom right hand corner of the screen then the spatial location of the alarm will be behind and to the right of the listener. Therefore, each alarm has a unique direction which can be used to identify which alarm is sounding.

Figure 6‑4: How the alarms are played for the spatialised simulations.
It was designed as a single-factor, within groups experiment.
To deal with the learning affect, fatigue and boredom biases, the order that the subjects carried out the tasks of the experiment was counterbalanced. One group would perform the stereo simulation and then the spatialised simulation whereas the second group would perform the experiment with these reversed. This way the results of the biases in each group would negate each other. Both groups practices with the monaural condition first.
|
|
Learning |
Condition 1 |
Condition 2 |
|
Group 1 |
Train and perform the experiment in mono, stereo, spatialised and then mono. |
Stereo |
Spatialised |
|
Group 2 |
Train and perform the experiment in mono, spatialised, stereo and then mono. |
Spatialised |
Stereo |
As stated in the Latin square, each subject participated in the simulation six times (they ran the simulation in each mode twice). The first four simulations were purely to remove the learning affect because the user performs considerably better when they have had a chance to learn the alarms. Only the results for the last two simulations were recorded for analysis. The monaural mode simulation sounded twenty alarms and the stereo and spatialised simulations each sounded ten. In all modes the order that the alarms were presented was randomised. Since the experiment was designed to compare stereo versus spatialised alarms the number of alarms in monaural was increased so that the user could become familiar with controlling the simulator and the alarms. The mono simulation lasted approximately 5 minutes and the stereo and spatialised simulations approximately 2.5 minutes each. The user would have approximately 30 seconds rest between each simulation. Leaving time for filling in the questionnaire and describing the experiment totaled approximately 30 minutes for each subject.
When the experiment is run in the spatialised mode as apposed to the stereo mode, the dependent variables will be affected by a statistically significant amount. The response times for spatialised versus stereo will be lower; the performance higher (amount of time the circle was over the crosshair); the error rate lower and the learnability for spatialised higher. The six NASA TLX measures will reflect the lowest overall workload for spatialised, then stereo and then finally monaural.
Whether the experiment is run in stereo or spatialised will have no statistically significant affect on the dependent variables. Additionally, there will be no significant difference between the NASA TLX ratings for the mono, stereo and spatialised simulations.
All subjects performed the experiment on a desktop personal computer (450 MHz Pentium III CPU, 256 MB RAM, Windows NT) using 17 inch monitors (resolution of 1024 by 868) and using a standard Logitech mouse to control the simulation. The alarms were heard using headphones that supported stereo sound (Sony MDR-V150) with the volume control in Windows set to maximum. The artificial horizon applet was displayed in a window of size 720 x 540 pixels.
Subjects participated in the experiment one at a time. They received a brief description of what to expect by the person running the experiment before commencing. They then commenced with running the six simulations in the appropriate order. After each simulation the user was presented with an on-screen message instructing them what to do next. After the fourth, fifth and sixth simulations the user was asked to fill in the appropriate section of the Nasa TLX form by the on-screen message. The details about response times, error rates and performance were unobtrusively recorded by the computer during the experiment. The user was finally asked to fill in the questionnaire.
The results of the experiment are divided into six different sections namely, response time, performance, error rate, learnability, NASA TLX measures and questionnaire results.
The response time for each alarm correctly turned off was measured for each simulation (mono, stereo and 3D). Table 6‑1 highlights the minimum and maximum response times, the range, mean and standard deviation.
Figure 6‑5 is a bar chart highlighting the average response times for each of the twenty subjects and an average of all the subjects put together. The y-axis gives the response time in milliseconds and the standard deviation for each mode (mono, stereo and 3D) is displayed on each bar.
|
Minimum (ms) (with subject) |
Maximum (ms) (with subject) |
Range (ms) |
Mean (ms) |
Standard Deviation (ms) |
|
|
Mono |
2520 (B) |
5184 (S) |
2664 |
3973 |
641 |
|
Stereo |
2189 (A) |
4712 (N) |
2523 |
3360 |
663 |
|
3D |
2489 (K) |
4782 (M) |
2293 |
3479 |
735 |
Table 6‑1: Table highlighting min, max, range, mean and standard deviation of the response times.
Although the response times for stereo alarms are faster than 3D alarms (the mean for stereo is 3360 ms and for 3D it is 3479 ms) it was necessary to analyse the results to see if this could have been caused by chance. Figure 6‑6 shows the t-test results for comparing the stereo and 3D results. A within groups, paired two sample for means t-test highlighted that there was not a significant difference in the results. A confidence level of 95% was selected, and a t-value of 2.09 was necessary to conclude that the independent variable was responsible for reducing the response time, see Figure 6‑6. Since the t-value given was only 0.95 it is likely that the improved response time was caused by chance.
|
|
|
Figure 6‑5: Bar chart representing average response
time for each subject and overall average |

Figure 6‑6: t-Test results for Stereo versus 3D response time.
The performance was recorded for each user and represented as a percentage of how much time the cirlce was overlapping with the cross-hair. Again this was recorded for the simulation running in different sound modes and
Figure 6‑7 and Table 6‑2 highlights the minimum and maximum performance scores, the range, mean and standard deviation.
Figure 6‑7 is a bar chart highlighting the performance scores for each of the twenty subjects and an average of all subjects put together. The y-axis gives the percentage of time that the circle was overlapping with the crosshair and the standard deviation for each mode (mono, stereo and 3D) is displayed on each bar.
|
|
Minimum (ms) (with subject) |
Maximum (ms) (with subject) |
Range (ms) |
Mean (ms) |
Standard Deviation (ms) |
|
Mono |
56.2% (G) |
99.6% (I) |
43.4% |
88.4% |
9.8% |
|
Stereo |
72.7% (G) |
100% (I) |
27.3% |
94.5% |
6.3% |
|
3D |
72.8% (G) |
99.4% (D) |
26.6% |
94.1% |
6.1% |
Table 6‑2: Table highlighting min, max, range, mean and standard deviation of performance scores.

Figure 6‑7: Bar chart representing performance score for each subject and the overall average.
Although the average performance score was slightly better for stereo than 3D (the mean for stereo is 94.5% and for 3D is 94.1%) it was necessary to analyse the results to see if this could have been caused by chance. Figure 6‑8 shows the t-test results for comparing the stereo versus the 3D performance scores. A within groups, paired two sample for means t-test highlighted that there was not a significant difference in the results. A confidence level of 95% was selected, and a t-value of 2.09 was necessary to conclude that the independent variable was responsible for improving the performance, see Figure 6‑8. Since the t-value given was only 0.68, it is likely that the improved performance was caused by chance.

Figure 6‑8: t-Test results for Stereo versus
3D performance scores.
Unfortunately the participants in the experiment made very few errors when running the experiment which provides little data for anlaysing which condition is the best. There were the same number of errors in both the stereo and 3D simulations, a mean of 0.2 errors per subject. The subjects did make more errors when running the monaural simulations but this was expected because every subject had less practice when running the mono simulation than the stereo and 3D simulations. The t-value of zero for the within groups, paired two sample for means t-test in Figure 6‑9 confirms that the independent variable had no affect on the dependent variables.

Figure 6‑9: t-Test results for Stereo versus 3D error rates.
It is worth mentioning that the analysis for measuring the learnability of each type of alarm is purely speculative and cannot be justified with student t-tests. Basically, the theory behind measuring learnability is that as the user become more familiar with the alarms, their response times will progressively get faster and faster, e.g., the tenth alarm they turn off should theoretically be faster than the first alarm they turned off.
By averaging the response times of the twenty subjects for the first alarm turned off, then the second alarm, the third alarm and so on until the last alarm, it will be easier to see if the learnability is any different for stereo versus 3D. Figure 6‑10 is a line graph which contains two lines representing this. Both the stereo and 3D simulations required the users to turn off ten alarms which are numbered one to ten on the x-axis. The response times of the twenty subjects, for each alarm number are then averaged and plotted on the line graph. Additionally, a trend line has been added for the stereo average response times and the 3D average response times which may give a better idea of which response times decrease quicker.
As can be seen from the trend lines, stereo has a steeper downward gradient than 3D. This indicates that the subjects learnt the stereo alarms quicker than the 3D alarms and hence the stereo alarms may be more learnable. However, it is unknown whether the difference was caused by chance or by the independent variable.

|
Figure 6‑10: Graph showing learnability of the stereo and 3D simulations. |
NASA TLX is a technique for determining the subjective workload for a specific task. Each user was asked to fill in a NASA TLX questionnaire to give ratings to the six factors that affect the overall workload.
To establish if the ratings given could have been given by chance, a within groups, paired two sample for means t-test was carried out for each category. A confidence level of 95% was always selected, which means that a t-value of 2.09 was necessary to conclude that the independent variable was responsible for affecting the category rating. Three t-tests were carried out for each category to compare mono versus stereo, mono versus 3D and stereo versus 3D.
Each of these six factors and the overall workload will be dealt with separately.
As highlighted by Franklin [19], mental demand refers to: “How much mental and perceptual activity was required (e.g. thinking, deciding, calculating, remembering, looking, searching, etc.)? Was the task easy or demanding, simple or complex,
exacting or forgiving?” which has endpoints low and high.
Each subject selected a point somewhere between low and high to rate the amount of mental demand they felt they were under for the monaural, stereo and 3D simulations, see Figure 6‑11. Each subject’s rating and an overall average is displayed in a bar chart, see Figure 6‑12.

|
Figure 6‑11: The form used to rate the mental demand experienced during each simulation. |

|
Figure 6‑12: Chart showing Mental Demand ratings for each subject and an overall average. |
The majority of the students rated stereo as requiring the least amount of mental demand, then 3D and then monaural. Additionally, the average rating given by the subjects is 62 for mono, 51 for 3D and 42.5 for stereo.
The t-tests carried out to show if the affect was caused by chance are shown in Figure 6‑13. For each the t-value was greater than 2.09 confirming that the stereo simulation required the least amount of mental demand, then the 3D simulation and then the monaural simulation. However, the subjects participated in the mono experiment first and with more alarms than the others, so the comparison of mono with stereo and 3D was expected. It is more than likely that the effect would be reduced if another experiment was carried out to compare mono with stereo and 3D fairly.

Figure 6‑13: t-Test results for Mono, Stereo and 3D Mental Demand ratings.
Physical demand refers to: “How much physical activity was required (e.g. pushing, pulling, turning, controlling, activating, etc.)? Was the task easy or demanding, slow or brisk, slack or strenuous, restful or laborious?” and has endpoints low and high [19]. Each subject selected a point somewhere between low and high to rate the amount of physical demand they felt they were under for the monaural, stereo and 3D simulations, see Figure 6‑14. Each subject’s rating and an overall average is displayed in a bar chart, see Figure 6‑15.

Figure 6‑14: The form used to rate the physical demand experienced during each simulation.
|
Figure 6‑15: Chart showing Physical Demand ratings for each subject and an overall average. |
The average rating given by the subjects was 55 for mono, 43.5 for 3D and 41.5 for stereo. This points towards mono requiring the highest amount of physical demand, then 3D and then stereo, although stereo and 3D have very close scores.
The t-tests carried out to show if the affect was caused by chance are shown in Figure 6‑16. The two t-tests comparing mono against stereo and 3D conclude that mono required the most physical demand (t-values of 3.78 and 3.61 respectively and they had to be greater than 2.09). However, the t-test comparing stereo to 3D enables us to conclude that the difference could have been caused by chance (the t-value was only 0.62 and had to be greater than 2.09). This is what was expected because the monaural simulation lasted twice as long as the stereo or spatialised simulations so would have required more physical demand. The stereo and spatialised simulations lasted the same amount of time and required the same amount of mouse movement which suggests the physical demand should not be different, as the results reflect.

Figure 6‑16: t-Test results for Mono, Stereo and 3D Physical Demand ratings.
As highlighted by Franklin [19], temporal demand refers to: “How much time pressure did you feel due to the rate or pace at which the tasks or task elements occurred? Was the pace slow and leisurely or rapid and frantic?” and has endpoints low and high.
Each subject selected a point somewhere between low and high to rate the amount of temporal demand they felt they were under for the monaural, stereo and 3D simulations. Each subject’s rating and an overall average is displayed in a bar chart, see Figure 6‑17.

Figure 6‑17: Chart showing Temporal Demand ratings for each subject and an overall average. |
The average rating given by the subjects was 46.5 for mono, 41.5 for 3D and 36.5 for stereo. This points towards mono requiring the highest amount of temporal demand, then 3D and then stereo.
The t-tests carried out to show if the affect was caused by chance are shown in
Figure 6‑18. The two t-tests comparing mono against stereo and 3D conclude that mono required the most temporal demand (t-values of 3.16 and 2.52 respectively and they had to be greater than 2.09). However, the t-test comparing stereo to 3D enables us to conclude that the difference could have been caused by chance (the t-value was 1.81 and had to be greater than 2.09). Again the mono simulation was first, so the subjects had less practice and the mono simulation required twice as many alarms to be turned off, so the mental demand would obviously be greater. If the experiment was repeated to compare mono fairly then the effect would have been reduced. However, the stereo and 3D simulations were compared fairly and the effect on temporal demand were not statistically significant, showing that the subjects didn’t feel any more comfortable with the pace at which the tasks occurred for either mode.

Figure 6‑18: t-Test results for Mono, Stereo and 3D Temporal Demand ratings.
Effort refers to: “How hard did you have to work (mentally and physically) to accomplish your level of performance?” and has endpoints low and high [19].
Each subject selected a point somewhere between low and high to rate the amount of effort they felt they put into the monaural, stereo and 3D simulations. Each subject’s rating and an overall average is displayed in a bar chart, see Figure 6‑19.

Figure 6‑19: Chart showing Effort ratings for each subject and an overall average. |
The average rating given by the subjects was 62 for mono, 44.5 for 3D and 46.5 for stereo. This points towards mono requiring the highest amount of effort, then 3D and then stereo.
The t-tests carried out to show if the affect was caused by chance are shown in Figure 6‑20. The two t-tests comparing mono against stereo and 3D conclude that mono required the most effort (t-values of 4.59 and 4.15 respectively and they had to be greater than 2.09). However, the t-test comparing stereo to 3D enables us to conclude that the difference could have been caused by chance (the t-value was 0.72 and had to be greater than 2.09). Again the mono simulation was first, so the subjects had less practice and the mono simulation required twice as many alarms to be turned off, so the effort would obviously be greater. If the experiment was repeated to compare mono fairly then the effect would have been reduced. However, the stereo and 3D simulations were compared fairly and the effect on effort were not statistically significant, showing that the subjects didn’t find either of the simulations any easier.

Figure 6‑20: t-Test
results for Mono, Stereo and 3D Effort ratings.
Performance refers to: “How successful do you think you were in accomplishing the goals of the task set by the experimenter (or yourself)? How satisfied were you with your performance in accomplishing these goals?” and has endpoints poor and good [19].
Each subject selected a point somewhere between poor and good to rate how successful they felt they were in accomplishing their goals for the monaural, stereo and 3D simulations. Each subject’s rating and an overall average is displayed in a bar chart, see Figure 6‑21.

Figure 6‑21: Chart showing Performance ratings for each subject and an overall average. |
The average rating given by the subjects was 49.5 for mono, 56.5 for 3D and 64 for stereo. This points towards the subject being the most satisfied with the stereo result than with 3D or mono.
The t-tests carried out to show if the affect was caused by chance are shown in Figure 6‑22. The two t-tests comparing stereo against mono and 3D conclude that the subjects thought they were the most successful with stereo (t-values of 3.22 and 3.13 respectively and they had to be greater than 2.09). However, the t-test comparing mono to 3D enables us to conclude that the difference could have been caused by chance (the t-value was 1.63 and had to be greater than 2.09). Since the subjects had less practice with the monaural simulation and had to turn off more alarms this was expected. It is likely that the effect would be reduced if this wasn’t the case.

Figure 6‑22: t-Test results for Mono, Stereo and 3D Performance ratings.
Frustration refers to: “How insecure, discouraged, irritated, stressed and annoyed versus secure, gratified, content, relaxed and complacent did you feel during the task?” and has endpoints low and high [19].
Each subject selected a point somewhere between low and high to rate the amount of frustration they felt during the monaural, stereo and 3D simulations. Each subject’s rating and an overall average is displayed in a bar chart, see Figure 6‑23.

Figure 6‑23: : Chart showing Frustration ratings for each subject and an overall average. |
As can be seen in the chart, Figure 6‑23, subjects A and T were not frustrated at all for any of the simulations which is why they gave ratings of zero for all three modes. The average rating given by the subjects was 45 for mono, 37 for 3D and 32.5 for stereo. This points towards the subject being the most frustrated with mono then 3D and then stereo.
The t-tests carried out to show if the affect was caused by chance are shown in Figure 6‑24. The two t-tests comparing mono against stereo and 3D conclude that the subjects were the most frustrated with the mono experiment (t-values of 2.63 and 2.10 respectively and they had to be greater than 2.09). However, the t-test comparing stereo to 3D enables us to conclude that the difference could have been caused by chance (the t-value was 1.21 and had to be greater than 2.09). Since the subjects had to turn off twice as many alarms and had less practice it makes sense that they were more frustrated with the mono simulation. If this wasn’t the case it is likely that the mono simulation would be less frustrating.

Figure 6‑24: t-Test results for Mono, Stereo and 3D Frustration ratings.
The overall workload is calculated from the six factors that have just been covered. The overall workload calculated for each subject and an overall average is displayed in a bar chart, see Figure 6‑25.

Figure 6‑25: Chart showing overall workload for each subject and an overall average. |
The average overall workloads was 531/3 for mono, 46 for 3D and 43.6 for stereo. This indicates the highest workload for mono, then 3D and then for stereo.
The t-tests carried out to show if the affect was caused by chance are shown in Figure 6‑26. The two t-tests comparing mono against stereo and 3D conclude that the subjects had the highest overall workload with the mono experiment (t-values of 5.57 and 5.12 respectively and they had to be greater than 2.09). However, the t-test comparing stereo to 3D enables us to conclude that the difference could have been caused by chance (the t-value was 1.39 and had to be greater than 2.09). As has been mentioned in each of the Nasa TLX categories, the mono simulation will have been given particularly unfair ratings because the subjects had to turn off twice as many alarms and had less practice with mono than the other modes. However, the substantial differences in results with the stereo and 3D simulations made it worth including them in this report.

Figure 6‑26: t-Test results for the overall workload for Mono, Stereo and 3D.
|
Subject |
GROUP A |
GROUP B |
|
Computing Science |
7 |
6 |
|
Software Engineering |
1 |
2 |
|
Computing & Economics |
1 |
0 |
|
Sports Medicine |
0 |
1 |
|
Sports Science |
1 |
0 |
|
Sports Nutrition |
0 |
1 |
|
Num of Instruments |
GROUP A |
GROUP B |
|
Zero |
7 |
7 |
|
One |
0 |
1 |
|
Two |
2 |
1 |
|
More than Two |
0 |
1 |
|
Frequency |
GROUP A |
GROUP B |
|
Never |
0 |
2 |
|
Yearly |
3 |
2 |
|
Monthly |
5 |
3 |
|
Weekly |
2 |
1 |
|
Daily |
0 |
2 |
|
Negative Aspect |
Frequency |
|
Mouse control is not very good - joystick would have been better. |
4 |
|
You can react faster to some auditory alarms than others due to the position of important words in the speech samples |
3 |
|
The spatial sounds weren't too good |
3 |
|
Trial run is too long |
3 |
|
Hard to concentrate on flying the plane while trying to listen to the alarms. |
3 |
|
Possibility of losing the focus
from the experiment window. |
2 |
|
Tired Hand |
2 |
|
Slight echo in spatial sound. |
1 |
|
Two alarms were almost identical. |
1 |
|
There was no link between the buttons on the screen and the layout on the keyboard. |
1 |
|
The corresponding function keys to each alarm was hard to establish. |
1 |
|
Feedback text area was distracting. |
1 |
|
Mouse was too sensitive. |
1 |
|
Mouse wasn’t sensitive enough. |
1 |
|
The window was too small – should have been maximisable. |
1 |
|
The buttons showing the available alarms were off putting. |
1 |
|
Still feel there was an element of
learning involved. |
1 |
Positive Aspect |
Frequency |
Stereo sound was helpful. |
5 |
Realistic alarms. |
3 |
3D sound was helpful. |
2 |
Sky and ground colours are good. |
2 |
Good fun |
1 |
The recording of data is hidden. |
1 |
The four buttons on each side match the four-button grouping on the keyboard. |
1 |
No evaluator influences. |
1 |
Good foreground activity, must concentrate a lot on foreground whilst trying to listen to commands. |
1 |
Great experiment. |
1 |
Response times being displayed is good. |
1 |
The affect of learning was greatly reduced. |
1 |
The mouse was responsive to mouse movements. |
1 |
Although the experiment was designed to compare stereo versus spatialised auditory alarms, the results for the mono simulations were also included. It was obvious that mono would do the worst because the subjects had less practice with mono than stereo and 3D. The NASA TLX scores were also biased in the favour of the stereo and spatialised simulations because the mono simulation required twice as many alarms to be turned off. However, the TLX scores given by the subjects and the opinions given in the questionnaire results permit concluding that the mono mode is less effective than it’s alternatives and shouldn’t be used when the stereo and 3D modes are available.
The conditions for comparing the stereo and spatialised modes were controlled and fair. The data collected from running the experiment on the twenty subjects was analysed using within groups, paired two sample for means t-tests and the majority of the results were not statistically significant. The stereo mode outperformed the spatialised mode in every respect but only two measures were statistically significant, namely the NASA TLX workload categories of mental demand and performance
To sum it up, despite the fact that the auditory alarms were quite different, i.e. stereo versus spatialised, there was very little impact on the results. Particularly, the response times, performance scores and error rates, which aren’t subject to biases such as the classroom affect (the subject’s performance being affected by what is expected from them), did not have statistically significant results. Two possible explanations are that the technology for spatialising sounds is not sophisticated and realistic enough to have an impact. Another explanation could be that even if the spatialised alarms are realistic, it won’t have a significant affect anyway, i.e., even though the spatial location of a sound is perhaps the most significant factor in determining the similarity of multiple sounds [13], Bregman states that it does not necessarily ‘overpower other bases for grouping when in conflict with them’ [9] which is why humans can separate different voices speaking over a monaural speaker. Either way, the experimental results reveal that the impact of current technology is minimal.
Although it has been shown that the present technological capabilities for supporting 3D sound don’t improve response time, performance or error rate, there may be different advantages from using 3D sound. For example, the Federal Aviation Authority’s literature review of visual and auditory symbols state that ‘a good symbol is simple, unitary, identifiable, readily associated with the thing it represents, and makes appropriate use of metaphors’ [20] so there is more to a ‘good’ auditory symbol than just response time, performance and error rate.
An example of binaural alarms that makes use of metaphors are auditory alarms that sound on the side of the person that corresponds to the side of the plane with the problem, see Figure 6‑27. This way it may be less likely that the pilot tries to ‘fix’ the side of the plane which is actually intact. Additionally, rather than just having an alarm sounding, it could carry some information as to how serious the problem is. For example if the altitude drops to an unsafe level, the alarm could sound as if it is coming from behind or above the listener. As the altitude drops to a very dangerous level, the sound could ‘move’ closer until it is sounding very close to the pilot.
Morris and Leung stated that research should be carried out in the area of determining priorities when confronted with multiple warnings [4]. When more than one alarm is going off, spatialised alarms would provide a way of distinguishing which alarmed condition needs attention first. For example, if the fuel is just low enough to sound an alarm but the oil pressure is getting dangerously high, it would be convenient to highlight the severity of each with the position of the sound.

Figure 6‑27: Alarms that carry an extra piece of information.
As the name would suggest, the TCAS system is responsible for assisting the pilot to avoid collisions with surrounding aircraft. Currently the TCAS system uses both auditory and visual displays to convey the infor