Beyond Belief: Representing Knowledge Requirements For The Operation Of Safety-Critical Interfaces

Chris Johnson

Glasgow Accident Analysis Group , Department of Computing Science,
University of Glasgow,
Email: johnson@dcs.gla.ac.uk

Abstract

Human intervention has played a critical role in the causes of many major accidents. The staff of the London Underground continued to allow trains to deposit passengers in Kings' Cross after the fire had started. The crew of the Herald of Free Enterprise set to sea with their bow doors open. The staff at the Bhopal chemical plant pumped Methyl-isocyanate into a leaking tank. Many of these accidents occurred because users did not understand the operating rules, or competency criteria, that were intended to preserve the safety of their application. This point has been reiterated in the accident reports that document these failures; staff and management lacked the necessary training to ensure the safety of their application. Unfortunately, there are few techniques that designers can use to reason about competency criteria for complex, interactive systems. The following pages address this problem and argue that epistemic logics can be recruited to represent knowledge requirements for safety-critical interfaces. These requirements form an important component of the training that is intended to determine whether users are competent to operate safety-critical systems. The application of a formal notation is appropriate because a range of organisations, including the UK Ministry of Defence and NASA, are advocating these languages for large-scale, development projects.

1. INTRODUCTION

Recent accidents in the transportation, power generation and chemical industries have illustrated the importance of ensuring that operators understand the rules and procedures that govern the operation of their applications. For example, the Piper Alpha enquiry found that:

"The lack of an exact format or content for the induction training; the brevity of time devoted to it; the almost cursory assessment of whether an individual required to attend the training...all point to the failure to ensure that all were properly informed on matters critical to their safety in an emergency. " (Cullen, 1990)

Such findings are surprising because national and international legislation emphasises the importance of adequate training. For example, the Australian Petroleum (Submerged Lands) Act states that:

"...an employer contravenes (the act) if the employer fails to take all reasonable steps to provide to employees in appropriate language, the information, instruction, training and supervision necessary to enable them to perform their work in a way that is safe and without risk to their health".

If companies are to obey this legislation then they must provide the regulatory authorities with evidence which shows that they deliver appropriate training and supervision. Unfortunately, many companies document their training in terms of high- level objectives, such as "the operator shall be aware of the general operating procedures for this system" (Fennel, 1988). This makes it difficult for companies to defend their practices when failures do occur. Accident reports into Bhopal (Morehouse and Subamaniam, 1986), Kegworth (AAIB, 1990) and Three Mile Island (USNRC, 1980) all criticise the training practices of the companies involved.

This paper argues that formal notations can be used to represent training objectives for safety-critical interfaces. This is appropriate because commercial and regulatory bodies have recommended the use of these notations in the development of safety-critical systems (MOD, 1991). Formal methods have also been used to support the design of human-computer interfaces (Harrison and Thimbleby, 1989). For instance, Dix (1991) has exploited a mathematical notation to investigate timing problems in multi-user systems. Elsewhere, we have shown that probabilistic logics can be used to analyse the stochastic processes that characterise interaction with safety-critical systems (Johnson, 1993, 1995). An important benefit of these techniques is that they provide a bridge between current practice in both human computer interaction, through task analysis (P. Johnson, 1992), and software engineering (Fidge, Kearney and Staples, 1993). Unfortunately, formal notations have not been used to represent training requirements. This makes it difficult for designers to reason about the rules and regulations that designers are intended to guide an operator's interaction with a safety-critical system.

Competency requirements for an off-shore oil production facility are used to illustrate the remainder of this paper. These systems have posed a significant challenge for both systems designers and human factors specialists (Wardell, 1989). Oil production facilities are complex applications. This is illustrated by the plan of the Piper Alpha control room, shown in Figure 1.


Figure 1: Plan of the Piper Alpha Control Room

Operators must monitor the extraction of oil from geological structures under the sea-bed. They must also control the extraction and purification of any gas that is recovered with the oil. They must monitor repair activities and maintenance schedules. They must also watch for problems that threaten the safety of the rig. For instance, gas leaks pose a considerable risk of fire. If gas is detected then control room personnel must investigate the cause and identify potential solutions.

It is important to emphasise that this paper focuses upon the operating rules and procedures that form a critical component of any operator's training. Brevity prevents a detailed discussion of the physical training that also play an important role during induction courses and competency tests. This topic is addressed in a companion paper (Johnson, 1996).

2. Representing Training Requirements

Companies that operate oil production facilities are faced with a difficult problem. On the one hand, regulatory authorities, such as the Health and Safety Executive, are urging them to document their training procedures. On the other hand, there are few techniques that can be used to develop such documentation for safety-critical interfaces. Natural language reports are unwieldy and quickly become intractable if they are used to represent many different training requirements for many different operators (Johnson, McCarthy and Wright, 1995). An additional problem is that training reports frequently describe the procedures that are used to instruct operators. They do not describe the objectives for that training. This makes it difficult to identify the criteria that are used to gauge the success or failure of that training. These problems can be avoided by exploiting user models to represent the cognitive objectives for an operator's training (Green and Benyon, 1995). For example, Figure 2 illustrates Barnard's Interacting Cognitive Subsystems (ICS).


Figure 2: High-Level View of the ICS Model

Visual, acoustic information is interpreted into object and morphonolexical representations. These form propositional representations using the sensory sub-systems and an implicational encoding. For example, Duke, Barnard, Duce and May (1995) use the following clause to describe what should happen when an operator is trained to delete warnings from a safety-critical application:

 per(delete_message) => 
	mesg@prop ^ 
	: prop-obj::obj-lim::lim-hand: \lim config       (1)
This states that if the user can delete a message then they form a representation of the message at the proposition level, mesg@prop, and they form a goal to remove the object, prop-obj, this is translated through processes that activate their musculature, obj-lim::lim-hand, to complete the necessary action. Practical problems prevent designers from using this approach to represent training requirements. The level of detail in (1) would place heavy burdens upon any company that had to document training requirements for many different operators. The developers of the ICS model are aware of these issues and are currently developing tools to support the application of their user models (May, 1995).

3. Epistemics for Training Requirements

In anticipation of current research in user modelling, it is important that designers have some alternative means of representing training requirements. Epistemology, or the study of knowledge, has a history stretching back to the Ancient Greeks (Fagin, Halpern, Moses and Vardi, 1995). This work has led to a number of epistemic logics that can be used to reason about the consistency and completeness of knowledge in areas such as Artificial Intelligence, Philosophy and Distributed Systems (Barwise and Perry, 1983). These logics offer a number of benefits for the analysis of training requirements. In particular, they provide a convenient bridge between human factors constraints and the formal specifications of systems engineering. For example, the Piper Alpha accident was probably caused by a condensate injection pump being operated without a pressure safety valve. Systems engineers might, therefore, require that a pump is only started if it is protected by a safety valve and there is no permit to perform maintenance on the pump. This latter requirement is important because operators must not start a pump that is being repaired by their colleagues:
safe_start(injection_pump)  <= 
        working(ps_valve, injection_pump) ^
        not work_permit(injection_pump).        (2)
It is safe to start an injection pump if a pressurised safety valve is fitted and there is not a permit to perform maintenance work on that component. This systems engineering constraint provides a focus for the development of training requirements. For example, the following clause uses the epistemic Kn operator (read as 'knows') to state that the user should only attempt to start a pump if they know that the safety valve is working and that there is no permit to work on that pump. This illustrates the close relationship between interface development and operator training. The human- machine interface must enable its users to start up the injection pumping equipment, input(user, start_injection_pump). Training must be provided so that operators understand the conditions which must be satisfied if they are to issue this input:
input(user, start_injection_pump)  <=> 
	Kn(user, working(ps_valve, injection_pump)) ^
	Kn(user, not work_permit(injection_pump))  (3)
A user provides input to start an injection pump if and only if they know that the safety valve for that pump is in working order and they know that there is no permit to work for that component. Such clauses provide a precise and concise means of representing the competency requirements that must be satisfied by the users of complex systems. They represent the minimum standards that operators must satisfy in order to be certified for particular applications. For example, the previous requirement might be used in an examination by posing questions such as:

What checks should be made before you issue input to start up the condensate injection pumps?

An important benefit of logic clauses, such as (3), is that they strip away the clutter of implementation details that frequently obscure critical properties of an interactive system. Designers and operators can identify critical training objectives without being forced to consider particular polling mechanisms and low level device handling. The ability to abstract away from such underlying details is significant because the operating procedures of safety-critical applications can be extremely complex. For instance, permits to work can be revoked if maintenance has to be suspended for a short time. Under such circumstances, operators may start the equipment if they know that a suspension permit has been issued:

input(user, start_injection_pump)  <=> 
	Kn(user, working(ps_valve, injection_pump)) ^
	(Kn(user, not work_permit(injection_pump)) v
	Kn(user, suspension(injection_pump)))         (4)
The user provides input to start an injection pump if and only if they know that the safety valve for that pump is in working order and they know that there is no permit to work for that component or they know that there is a suspension permit on the pump. This illustrates how designers and analysts can gradually introduce additional clauses to document the many different items of information that operators must be trained to consider during interaction with safety-critical systems. Unfortunately, it does not demonstrate that epistemic logic can be used to identify training requirements for particular operators performing particular tasks.

4. Training For Particular Operator Tasks

Many oil production operators will never need to start up the injection pumping equipment. Toolpushers, divers and rig electricians are not expected to perform such tasks during their routine duties. Fortunately, the allocation of training requirements can be represented by changing the user marker in previous clauses to describe particular roles in the operation of the system. For example, the lead production operator, lpo, is responsible for starting up the condensate pumping equipment:
input(lpo, start_injection_pump)  <=> 
	Kn(lpo, working(ps_valve, injection_pump)) ^
	Kn(lpo, not work_permit(injection_pump))   (5)
The lead production operator provides input to start an injection pump if and only if they know that the safety valve for that pump is in working order and they know that there is no permit to work for that component. Additional clauses can be introduced to represent the training requirements that relate to other users. For example, it might be specified that the dive superintendent, dsi, should issue orders to start diving operations if they know that the diesel fire pumps have been switched from manual to automatic. This is a critical training requirement because fire-fighting pumps are fed with sea-water. If the pumps are activated while divers are working on the rig then they may be sucked towards the pump inlets. As before, it is important that designers have some means of explicitly representing such requirements. Competent dive superintendents must check that the pumps are on manual if they allow divers to enter the water:
input(dsi, start_diving)  <=> 
          Kn(dsi, manual(diesel_fire_pumps)).    (6)
The diving superintendent issues input to start diving operations if and only if they know that the diesel fire pumping systems have been turned from manual to automatic. The previous clause will only be successful if operators can access the information that they have been trained to consider. For example, it would be little use training the dive superintendent to check on the state of the diesel_fire_pumps if they could not find out about the state of these components using the displays presented by the on-board information systems.

5. Training and Systems Engineering

It is important to emphasise that epistemic clauses, such as (5) and (6), do not guarantee the safety of an application. For instance, it is entirely possible for the diving superintendent to believe that the diesel pumps are on manual when they are, in fact on automatic. Similarly, the lead production operator might believe that the safety valve is operational when it is broken:
 potential_accident  <= 
	 Kn(lpo, working(ps_valve, injection_pump)) ^
	 not working(ps_valve, injection_pump)       (7)
There is the potential for an accident if the lead production operator knows that the safety valve is working and it is, in fact, the case that the valve is not working. This clause represents a situation in which the lead production operator holds an erroneous belief about the state of their application. Systems engineers and interface designers must co-ordinate their activities to ensure the operator only believes that the pressure safety valve is operating if and only if that component is working:
Kn(lpo, working(ps_valve, injection_pump)) <=>
	     working(ps_valve, injection_pump)          (8)

The lead production operator knows that the safety valve for that pump is in working order if and only if that component is, in fact, working. Clause (8) represents a design objective that must be satisfied through the development of an appropriate user interface. The lead production operator will only know that the safety valve is working if they are presented with information which states that this component is operating. This, in turn, relies upon the underlying systems engineering. Sensors must be deployed to accurately gauge the state of the valve. If the display suggested that the safety valve was working and that valve was broken then the operator might start the condensate pump. In such circumstances, the user's training might actually threaten the safety of the system. The operator simply follows the criteria laid down in clause (3). If systems engineers believed that the monitoring equipment might provide erroneous values then the requirements in (3) might be altered so that operators were trained to perform additional checks. Epistemic logic can represent the reliability requirements that systems and human factors designers must satisfy in order to avoid changing the earlier training criteria:
present(lpo, ps_valve_ok_icon) <=>
	working(ps_valve, injection_pump)        (9)
The lead production operator is presented with an icon to show that the safety valve is in working order if and only if that component is working.

Human factors and systems engineers must have some means of explicitly representing common objectives during the team-based development of interactive systems (MacLean, Young, Bellotti and Moran, 1990). The previous clause demonstrates that epistemic representations of training requirements can help to identify these shared objectives. Systems engineers must allocate sensing devices to detect the state of the safety valve. Human factors engineers must ensure that users can exploit the display to support their control tasks.

6. Training and Interface Design

Norman (1990) argues that operator training must be supported by the development of 'appropriate' displays. The converse is also true. An interface is of little benefit if operators cannot be trained to exploit the information that it presents. The following clause demonstrates that epistemic logic can be used to represent the link between display objects and the information that operators have been trained to obtain before starting the injection pumps:
Kn(lpo, working(ps_valve, injection_pump)) <=>    
        present(lpo, ps_valve_ok_icon)         (10)
The lead production operator knows that the safety valve is in working order if and only if there is an icon to confirm this fact. Clause (10) provides an extremely abstract description of an interactive system. This is an advantage because it does not force designers to consider the low level details of particular presentation devices. Unfortunately, the logic provides little idea of the images that might be used to represent the valve icon. Logic can, however, be used to represent the structure and content of complex displays. For example, the following clauses describe the way in which the condensate_display might include the image of the ps_valve_ok_icon. This in turn includes the image of a feed adjuster. Finally, the feed adjuster is composed of the graphical primitives that are presented to the user:
part(condensate_display, ps_valve_ok_icon)	(11)
part(ps_valve_ok_icon, feed_adjuster)         	(12)
line(feed_adjuster,  0.1, 0.2, 0.6, 0.2)      	(13)
The first clause states that the ps_valve_ok_icon is part of the condensate display. The second clause states that a feed flow adjuster is, in turn, part of the ps_valve_ok_icon. Finally, the third clause states that the image of the feed flow icon contains a line from Cartesian co-ordinates (0.1,0.2) to (0.6, 0.2). Figure 3 presents the structure of the image that is described by the previous clauses. The resulting graphical structures can be directly executed using the Prelog prototyping system. Brevity prevents a detailed discussion of this approach, the interested reader is directed to (Johnson, 1995a).


Figure 3: Image for the Condensate Display

The significance of the formulae (10,11,12,13) is that they explicitly represent the information which designers intend to communicate through particular images. The ps_valve_ok_icon tells the user that the injection valve is working, Kn(lpo, working(ps_valve, injection_pump). These clauses provide a focus for evaluation. Lead production operators can be asked to describe the information that is represented by the valve icon. If the users fail to recognise that the valve is working then the image must be redesigned or operators must be trained to recognise the meaning of the existing image.

7. Representing Training Violations

Previous sections have argued that epistemic logic can represent knowledge requirements for safety- critical interfaces. The use of a logic notation does not, however, guarantee that operators will remember the information that was stressed during their training. For example, the lead production operator might attempt to start the pump with insufficient information about current maintenance activities. Under such circumstances, they would not know whether or not a permit to work had been issued:
training_failure <=
	input(lpo, start_injection_pump)  ^ 
	Kn(lpo, working(ps_valve, injection_pump)) ^
	not Kn(lpo, work_permit(injection_pump)) (14)
A training failure occurs if the lead production operator provides input to start an injection pump and they know that the safety valve for that pump is in working order but they do not know that there is no permit to work for that component. It is important to emphasise the difference between clause (3) and clause (14). In the former case the lead production operator definitely knows that there is no permit to work on the injection pump, Kn(lpo not work_permit(injection_pump)). In clause (14), the operator does not know whether there is a permit to work on that component, not Kn(lpo, work_permit(injection_pump)). These two different cases can, in turn, be compared to the situation in which the user deliberately starts the pump even though they know that there is maintenance being performed on that component:
deliberate_training_violation <=
	input(lpo, start_injection_pump)  ^ 
	Kn(lpo, working(ps_valve, injection_pump)) ^
	Kn(lpo, work_permit(injection_pump))      (15)
A lead production operator deliberately violates their training if they start an injection pump and they know that the safety valve for that pump is in working order and they know that there is no permit to work for that component. Clauses (14) and (15) are violations of the training requirements represented in (3). Regulatory authorities would expect a formal review if the lead production operator attempted to start an injection pump that was currently being worked on. If the enquiry could demonstrate that the operator was simply negligent, as represented in (14), then the sanctions would be less severe than the deliberate neglect described in (15). The use of a precise, mathematical notation, therefore, provides a firm basis upon which to draft operating conditions for safety-critical applications. Without such precision and without better techniques for documenting knowledge requirements, future accident reports will continue to criticise the training that is provided for complex, safety-critical interfaces.

8. Conclusion And Further Work

This paper has demonstrated that epistemic logic can be used to represent training requirements in terms of the knowledge that operators must possess under particular working conditions. This approach provides a more precise and concise means of representing competency criteria than the high level objectives set by legislation, such as the Petroleum (Submerged Lands Act). An important benefit of our approach is that it can be directly integrated with current techniques for software engineering. This is vital if operator requirements are to be explicitly considered during the detailed development of safety-critical systems.

Our approach is limited in the sense that it focuses on knowledge requirements for safety-critical interfaces. We have not attempted to represent the physical skills that form an important component of many training programmes. Elsewhere we have described how logic can be used to reason about the physical layout of complex working environments (Johnson, 1996). Future work intends to build on this to represent the ways in which operators must integrate computer-based tasks with more general, physical activities in safety-critical systems.

Future work will also build closer links with task analysis and user modelling. For instance, epistemic logics can be used to express the propositional and implicational information that is represented in the ICS model. Alternatively, the logic notation can be used to formalise the structured notations in Green and Benyon's (1995) ERMIA user modelling techniques.

Finally, we are concerned to improve the usability of the approach described in this paper (Gray and Johnson, 1995). Mathematical requirements provide non-formalists with a poor impression of what it would be like to interact with a safety- critical system. We are addressing this problem by extending the Prelog prototyping. This enables designers to directly execute the logic clauses that are presented in this paper. At any point during interaction, designers can ask the user to justify their reasons for a particular action. These can then be compared to the epistemic, Kn, clauses to determine whether the operator has taken into account all of the information requirements that were identified during their training.

Acknowledgements

Thanks go to the members of the Glasgow Interactive Systems Group (GIST) and to the Formal Methods and Theory Group in Glasgow University. This work is supported by the UK Engineering and Physical Sciences Research Council, grants GR/JO7686, GR/K69148 and GR/K55040.

References