Identification and Analysis of Incidents in Complex, Medical Environments
Daniela K. Busse and Chris W. Johnson
Department of Computing Science, University of Glasgow, UK.
Medical risk management is often seen as lagging behind other safety-critical industries, where there has been considerable research into safety and accident causation models. Accident analysis models used in, for instance, aviation and process control recognise the importance of formalised root cause analysis, and the multi-level nature of incident causation. Latent factors, such as management and organisational issues, are stressed as underlying structural precursors for incident occurrence. Also, the constraints of the human cognitive system and their relationship to task performance are taken into account. These consideration are reflected in the use of incident reporting schemes, the anaysis if the collected incident data, and the generation of remedial action recommendations. In this paper, we will illustrate how these concepts can be applied in a clinical setting. An incident reporting scheme implemented at an Edinburgh Intensive Care Unit will serve as a case study.
Clinical Adverse Events; In 1990, the Harvard Medical Practice Study investigated the occurrence of patient injury caused by treatment - so-called adverse events. It found that nearly 4% of patients suffered an injury that prolonged their hospital stay or resulted in measurable disability. Leape pointed out that, if these rates are typical of the US, then 180000 people die each year partly as a result of iatrogenic (‘doctor-caused’) injury. Since most of the precursors to iatrogenic injuries are perceived to be ‘Human Error’ , the possibility of negligence causes great concern. This is mirrored in the litigious climate in the US , and led to a considerable increase in interest in the causes of adverse events . The cost of adverse events is high; not only in human suffering, but also in compensation claims and the need for prolonged treatment of afflicted patients.
Learning from Adverse Events; In the UK, the drive towards clinical effectiveness has led investigators to concentrate on increasing the quality of care while lowering the costs associated with our current health care system. In the course of the clinical effectiveness program, the idea of clinical audits has gained in strength – a "professionally led initiative which seeks to improve the outcome of patient care as a result of clinicians examining their practice and modifying it appropriately" . However, the data collected for audits concerns itself only with factors peripheral to adverse events – cost effectiveness being the focus of the investigation (see e.g. ). Formal analysis of the causes of adverse events does not take place. Rather, a variety of committees and commissions are typically set up locally, meeting regularly to review any cases of iatrogenic injury that had occurred and had been brought to attention by the staff involved . The reporting as well as the analysis of these events, however, are subject to local convention, and thus expresses the self-regulating policy of the health care community.
In some cases, incident reporting schemes are in place. This concerns "near miss" adverse events, i.e. cases in which iatrogenic injury was likely to have occurred, but the hazardous situation could be recovered from successfully. However, it was noted that even under the clinical reporting schemes, in-depth analysis and search for root causes of adverse events does not take place .
Human Error Analysis in Aviation and Process Control; In safety-critical domains other than health care, accidents (i.e., like clinical adverse events, non-intended events which lead to negative outcomes and loss) have received a great deal of attention. Research into their causes and prevention has made considerable advances. Accident causes formerly described solely as ‘Human Error’ have come under close scrutiny, notably with the work of Rasmussen , Reason , Hollnagel (1991), and Hale . They have systematically analysed cognitive mechanisms underlying the various phenotypes of human error. Also, latent contributing factors are taken into account. These concern organisational as well as managerial influences on the course of the accident. Thus, there was a shift from ‘blaming the human’ (such as the oft-cited ‘pilot error’) to the insight that error invariably occurs in complex systems. The aim now is to create error-tolerant systems that absorb errors through ‘system defences’ and provide redundancy and possibilities for error recovery. The move away from the blame culture also made possible the introduction of institutionalised, anonymous, and non-punitive incident reporting schemes. Subsequent detailed comparison and analysis of the identified root causes is carried out by organisations such as the US National Transportation Safety Board (NTSB), or the US Federal Aviation Administration (FAA) and NASA. In the clinical domain the argument has also been put forward for the use of incident reporting to complement post-hoc accident investigation with the inherent problems for scant information, altered perception and outcome bias . Furthermore, analysis of the reported events should extend beyond the investigation of proximal causes to include latent system failures.
Structure of this Paper
This paper looks at incident management practices in a clinical setting, as compared to the approaches in other safety-critical domains. The main issues we will investigate in this paper concern the problems associated with incident reporting, categorisation and subsequent analysis. We will illustrate existing risk mangement in medicine by an incident reporting scheme employed in an Edinburgh Intensive Care Unit (ICU). In Section 2, we will introduce current applications of the Critical Incident Technique in medicine and give an outline of the Edinburgh implementation. Section 3, 4, and 5 will look at incident reporting, categorisation, and analysis in turn, mirroring the stages of the incident investigation process. Section 6 investigates the generation of action recommendations from the prior analyses. In each section, we will compare and contrast theory and methodology ‘lessons’ from safety-critical domains such as aviation with the implementation of the Edinburgh incident reporting scheme. Conclusions are given in Section 7.
Incident Reporting Schemes in Medicine
Incident reporting schemes and error analysis methods in medicine are two of the lessons that can be drawn from research in domains such as aviation and process control . The US Federal Aviation Administration (FAA) established a confidential reporting system (the Aviation Safety Reporting System) for safety infractions as early as 1975. The program's success was found to depend on its non-punitive confidential approach. It also supported implemented feedback of data analyses and implemented remedial measures. The analysis of the reports and subsequent actions appear as a regular feature in several pilots’ magazines to ensure a closed feedback loop.
A successful incident reporting scheme is not deemed possible with a punitive ‘perfectibility model’ – i.e. the belief that if staff are properly trained and motivated, there will be no mistakes . This, however, is seen to be a predominant cultural factor in the medical domain (op.cit.). The focus on motivation in assigning blame to individuals has been under heavy attack in other domains (e.g. ). Punishment of individuals as a means to prevent future adverse events is considered a dead-end approach to error management .
Thus, there is a need in medicine to recognise the inevitability of error and adverse events. Safety culture that takes this into account in clinical system design is still lacking . There have been some notable exceptions in the recent past where incident reporting schemes were implemented and the identified incidents analysed. Runciman et al. studied anaesthesia incidents in Australian hospitals. This study was subsequently extended to also investigate ICU incidents. Also, extended their incident reporting and analysis scheme from process control to fit medical systems. These are some examples of large-scale, systematic effort to transfer some of the insights and methods of other safety-critical domains to medicine. Thus, the use of incident reporting schemes and theories on accident causation are the two main contributions so far to safety applications in the medical domain.
The Edinburgh ICU Incident Reporting System
The Edinburgh incident reporting scheme was set up in an adult intensive care unit in 1989. It has been maintained by Dr David Wright, who is an anaesthesist and one of the ICU consultants. The unit has 8 beds at its disposal, and there are roughly 3 medical staff, one consultant, and up to 8 nurses per shift on the ward. Equipment in an ICU ranges from monitors displaying life sign data, such as heart rate and intra-cranial pressure (ICP), to drug administration equipment, automatic breathing machines, and oxygen humidifier masks. Patient mangagement involves tracking and transcribing monitored vital data, laying and maintaining lines such as endotracheal tubes, and chest drains, and handling equipment, such as three-way taps for drug administration, ventilators and defibrillators.
Incidents reported over ten years (see ) fell mainly in four task domains: relating to ventilation, vascular lines, drug administration, and a miscellaneous group.
The incident scheme employed reporting forms that encouraged staff to describe the event in narrative form, as well as noting contributing factors, detection factors, grade of staff involved in the event and that of the reporting staff.
One crucial factor in the implementation of the scheme was its anonymity. Dr Wright, in his role of the scheme manager, was the only person who had access to the completed forms. The collected data was coded into categories (see Section 4), summarised. It was then collated into frequency tables. Information that may identify staff was removed, and action recommendations were proposed. The results of this initial analysis were then iterated over by Dr Wright and the Senior Nurse of the unit. Together, the data was again inspected and final revisions of action recommendations were carried out. The conclusive results were disseminated regularly among the staff of the unit, and thus an effective feedback loop was created.
For a more detailed description of the implementation and findings of the incident reporting scheme, see .
There are several definitions of what constitutes an ‘incident’. Incidents might be considered adverse events only, near miss events only, or both. In the Edinburgh study, staff are asked to report ‘critical incidents’, which are defined as any occurrence that might have led (if not discovered in time) or did lead, to an undesirable outcome. In consequence, each recorded incident:
Also, complications which occur despite normal management are noted not to be critical incidents.
Notable points in this definition are the preventability of the event, and the term ‘normal management’. Also, by this definition, human involvement in the occurrence must have been characterised by an ‘error’ of staff. All these three concepts can be argued to be open for interpretation, and thus as being subject to reporting biases.
What constitutes an ‘error’ has been subject to intense debate in the accident analysis community. The Australian Intensive Care Unit Incident Monitoring (AIMS-ICU), for instance, settles for the description of an incident as "any unintended event or outcome […]" (our emphasis). However, Reason often noted the important role played by so-called violations, rather than error, in accident causation . As an example, work procedures that present safety barriers for infrequent adverse events but are time consuming or awkward to carry out, might frequently be omitted to suit current, seemingly more urgent, task demands. These violations might be common place, and not be considered an ‘error’ as such, since they were ‘intended’. Management attitudes, safety culture, and factors such as time pressure and fatigue impact on the perspective taken on such violations.
Also, cases often go unreported for reasons such as fear of punishment, or not being convinced of the usefulness of the scheme. But incidents might also not be conceived by staff to be ‘accidents waiting to happen’ , but mere task characteristics, with ‘error’ detection and recovery taken for granted. It has also been put forward, that sometimes reporters are not aware of ‘upstream precursors’, such as underlying system faults . They also might not appreciate the significance of local workplace factors. For instance, if they have been accustomed to working with substandard equipment, they may not report this as a contributing factor; if they habitually perform a task that should have been supervised but was not, they may not recognise the lack of supervision as a problem . The implicit nature of such tacit task and problem knowledge has been widely recognized in work practice research, and is being addresses for instance by knowledge acquisition methods, and direct observation methodologies (see for instance ); ); ).
These are biases in what is not being reported. There are also biases in what is being reported. As frequently noted, the absolute incidence of incidents cannot be inferred from incident reporting data. There are many factors confounding the representativeness of the data sample obtained. The people who volunteer to report are not representative of the population at hand. In the Edinburgh study, reporters were also often not involved in the incident themselves. In the most recent documented reporting period (May-November 1998), only one third of the reporters had been involved in the incident (10 out of 27 reports). Also, there were hardly any medical staff represented among the reporters (2 out of 27).
The Importance of Incident Detection Factors
Noting staff grade can provide interesting information about the distribution of staff involved in the incident and those reporting it. Wright , in an initial presentation of the scheme's results, pointed out that it is usually experienced staff who correct the incident, and thus prevent an adverse event occurring. Establishing the factors that contribute to the incidents’ detection is a highly important aspect, which is often neglected in incident reporting schemes. It has been suggested that provisions for incident detection and recovery provide more effective safety measures than an approach solely targeting accident prevention or avoidance (;). This mirrors the above-mentioned insight into the universality of human error. The Edinburgh study encourages staff to detail factors that contributed to the detection of the incident. Given the high rate of people reporting who detected the incident (rather than being involved), this is especially appropriate.
However, even if detection factors are noted, they are typically not being analysed in depth. The analysis should include system factors as well as cognitive aspects of the task and work environment. We will briefly illustrate this issue in Section 4.3.
The relationship between accidents (adverse events) and (near miss) incidents is often represented in an ‘iceberg metaphor’ (. It is assumed that near miss incidents and accidents involving negative outcomes share the same root causes. In accident causation research, the categorisation and analysis of human error has received considerable attention. Several taxonomies exist, ranging from behavioural classifications to classification of error according to underlying cognitive mechanisms . Each of these presents a first step towards the analysis of an incident, since choices as to the nature and characteristics of the occurrence are made. However, behaviourist classifications such as Hollnagel's do not aid the understanding of the underlying mechanism of erroneous actions.
Moreover, behavioural classifications quickly become large and unwieldy. Rasmussen also observe, that if the analysis is based only on the consideration of human error in terms of their external manifestation (such as omission, commission, and inappropriate timing) a priori error identification will be hindered by a combinatorial explosion.
The distinction between varieties of human error according to their congitive origin plays a significant role in accident analysis because they require different methods of error management and remediation . Rasmussen's Skill, Rule, Knowledge (SRK) framework, and Reason's Generic Error Modelling System (GEMS) have been widely applied in safety-critical domains to investigate the nature of accidents.
Incidents, according to Rasmussen , should be considered occurrences of a human-system mismatch, which can only be characterised by a multifaceted description (see Figure 1). Thus, faults and errors cannot be defined objectively by considering the performance of humans or equipment in isolation. This insight needs to be reflected in the error categorisation system.
Figure 1 Taxonomy for Description and Analysis of Events involving Human Malfunction (Rasmussen, 1982)
Reason pointed out that an error is not to be seen as the accident or failure itself, but as preceding the failure. ‘Error’ can lead to ‘active failure’ as well as ‘latent failure’. Reason maintains that active failure is usually associated with the performance of ‘front-line’ operators (such as pilots and control room crews) and has an immediate impact upon the system. Latent failure is most often generated by those at the ‘blunt end’ of the system (designers, high-level decision-makers, managers, etc.) and may lie dormant for a long time. Rasmussen's model (Figure 1) also mirrors these hierarchy of incident causation. The ‘Causes of Human Malfunction’, ‘Situation factors’, ‘Factors affecting Performance’ and ‘Personnel Task’ can be seen in relationship to Reason's latent failures. These, together with the constraints of the human cognitive processing system, produces an ‘External Mode of Malfunction’, leading to an accident or incident.
Thus, the categorisation of ‘causes’, meaning precursors, of the incident needs to reflect the hierarchical nature of the causal chain. This chain involves latent system failures that influence work conditions, which in turn provide the context for the proximal incident cause to occur. This perspective on accident causation is mirrored in most accident causation and analysis models in e.g. the transport domain.
For instance, the NTSB accident/incident database contains sections with factual elements ("who, what, where"), sequence of events, and a narrative description of the incident. The NTSB uses a multiple-cause classification system with allowance for up to seven "occurrences", and several "findings" for each occurrence. Any of the findings may be prescribed as a cause or contributory factor. The probable cause statement is usually an integrated text. Each "finding" may, in turn, be divided into "subjects". The subjects come from a coded list and refer to whether the finding was person-related, and whether it was a proximal cause or not . Therefore, the database allows for a causal tree being constructed by ascribing the level of proximity of the findings to the reported occurrence. Thus, it can be distinguished between causal factors and conditions facilitating the occurrence, and suggested root causes. However, the available constructs largely identify who was involved in a finding but often stop short of an assessment of why the error was made. Even the probable cause statements are largely descriptive in nature, without reference to latent failures or human information processing. This information, however, is important in developing preventive strategies .
The Edinburgh Categorisation Scheme
In the Edinburgh study, information drawn from the incident reports were categorised into ‘causes’, ‘contributory factors’, and ‘detection factors’ (see . The categories were arrived at through informal coding of the narrative incident data. This bottom-up approach led to a domain-specific, behavioural categorisation scheme.
‘Causes’ offers the subcategories of Human Error and Equipment Failure. Any incident that has some degree of human involvement is considered a Human Error. Furthermore, the human error incidents are classified as to the various task and equipment domains these refer to, such as "vascular lines related", "drugs-administration-related", or "ventilator-related". Thus, the categorisation mainly labels the incidents without providing a step towards causal analysis. Rather, it points to where in the patient management task sequence the incident occurred. ‘Cause’ here refers to the task domain of the proximal causal factor. Mostly, the actual proximal ‘cause’ of the incident cannot be inferred from this categorisation per se. A summary of the narrative description of the occurrence is referred to in order to reconstruct the proximal cause.
However, the categorisation of the contributing factors sheds some light on more distal, and less domain-dependent, ‘causes’. Both, the task/equipment domain and the contributory factors together can be seen as representing pointers as to in which, and where in the task sequence the underlying problems that led to the incident might be found. Thus, they can focus further enquiry. The initial categories were
The ‘contributing factors’ categorisation scheme evolved since the time of creation, and nearly doubled from 12 categories to 23. This was the result of the ongoing iterative coding of the collected incident data, and of experience with the reporting scheme. The added categories are:
In the initial version of the taxonomy, mainly so-called Performance Shaping Factors (see also Rasmussen, ) are listed as contributing causes, such as Fatigue, Unit Busy, and Night Time. Poor Communication can also be considered a performance shaping factor. The one factor notably not a PSF is ‘Thoughtlessness’. The only factor that clearly denotes a latent failure is ‘Poor Equipment Design’.
The initial categorisation scheme focussed on factors that created the situation precipitating the incident. However, the refined categories are increasingly task and domain specific, and do not denote generic Performance Shaping Factors. Instead, a behavioural and task domain dependent taxonomy is introduced, see especially the last seven factors above. These categories can provide the basis for descriptive statistics on relative frequency of occurrences and for denoting trends in the distribution and combination of incidents. However, analysis of the underlying causes of the incident is not facilitated.
‘Cause’ Occurrence '89
‘Contributing Factors’ Occurrence '89
‘Detection’ Occurrence '89
‘Vascular line’: 6
‘Non-disp. Equipment’: 2
Poor Communication: 14
Poor Equip. Design: 11
Inexperience with Equipment: 5
Lack of Suitable Equipment: 4
Night Time: 3
Unit Busy: 2
Failure to Perform Hourly Check: 2
D1 Regular Checking: 11
D2 Alarms: 11
D3 Experienced Staff: 8
D5 Patient Noticed: 1
‘Cause’ Occurrence '98
‘Contributing Factors’ Occurrence '98
‘Detection’ Occurrence '98
‘Vascular line’: 4
‘Non-disp. Equipment’: 1
Poor Communication: 8
Inexperience with Equipment: 4
Night Time: 3
Failure to Check Equipment: 3
Failure to Perform Hourly Check: 2
Endotrach. Tube Not Properly
Poor Equipment Design: 1
Patient Inadequately Sedated: 1
Turning the Patient: 1
D1 Regular Checking: 9
D3 Experienced Staff: 8
D2 Alarms: 2
D4 Unfamiliar Noise: 1
D5 Patient Noticed: 1
D7 Handover Check: 1
Table 1 - Causal Categorisation Sample '89 and '98
The evolution of the categorisation scheme itself provides valuable information. The novel factors were created by filtering them out of the incident data reported and analysed over the years. Such a bottom-up approach establishes and clarifies problem areas within both the task domain and the handling of the provided equipment. Thus, insights into task characteristics and performance can be gained. This can be put to use, for instance, for providing training focus, while offering strong empirical support. For instance, action recommendations in the period form August 1995 to August 1998 pay heed to the recurring problem of dislodged endotracheal tubes. Initially, reminders are repeatedly puplicised about this common problem. Reasons for dislodgement are given. Then, a list is devised that summarises "reasons for endotracheal tubes coming out". This is disseminated, and later ‘suggested actions’ recommend to revise this list and publicise it further.
In comparison of the revised categorisation scheme with the categorisation approaches discussed above it can be noted that a hierarchical causal chain can still be constructed by dividing the data according to, for instance, Rasmussen's Taxonomy for Description and Analysis of Events involving Human Malfunction (see Figure 1, and Section 5.3 for an example). For instance, contributing factors on the AIMS-ICU form are divided into ‘system-based factors’, which detail work condition factors as well as latent failures, and ‘human factors’, which note factors that impact on the human cognitive processing levels. There is a trade-off, however, since the provision of fixed categories lessens the flexibility of the data reported, and might stifle creativity for staff attempting to explain how and why the incident occurred.
In the classification of data into the contributing causes categories, combinations of factors are allowed, and are noted frequently in the data sample. The data analysis might thus also be based on noting the frequency or likelihood of certain factors correlating. Trends could be established, and conclusions drawn that are justified by a richer data set than only noting the occurrence of single factors. To illustrate this, we took two data samples (see Table 1), one sample covering the first categorisation interval, January and February 1989 (sample89), and the other covering a more recent interval from May to November 1998 (sample98). Both samples cover 25 incident reports.
In sample98, the predominant factors are ‘Thoughtlessness’ (10 occurrences), ‘Poor Communication’ (9 occurrences), and ‘Inexperience with Equipment’ (5 occurrences). In contrast, in sample89, factors ‘Poor Communication’ (14 occurrences), ‘Poor Equipment Design’ (11 occurrences), ‘Inexperience with equipment’ (6 occurrences), and ‘Lack of suitable Equipment’ (4 occurrences) were most implicated in incidents.
A closer look at the data reveals that in the 1989 interval, half of all ‘Poor Equipment Design’ incidents and one third of ‘Poor Communication’ are not single factor categorisations, but are placed in combinations. ‘Poor Equipment Design’ is predominantly (four out of five incidents) paired with ‘Lack of Suitable Equipment’. ‘Poor Communication’ is camobined with a variety of factors, such as ‘Fatigue’, ‘Thoughtlessness’, and ‘Unit Busy’. This use of combinatorial categorisation embeds behavioural and person factors (e.g. Failure to Check Equipment, Thoughtlessness) within latent system and work condition factors such as Poor Equipment Design, Fatigue, and Poor Communication.
In sample 98, ‘Poor Communication’ is paired with other factors in 6 out of 9 incidents. ‘Thoughtlessness’ is shown with other factors (such as Inexperience with Equipment) in 4 out of 10 incidents. ‘Inexperience with Equipment’ is only ever mentioned in combination. In sample 89, ‘Inexperience with Equipment’ is left as sole contributory factor only twice out of a total of six incidents. Three times it is mentioned in combination with ‘Poor Equipment Design’ and ‘Lack of Suitable Equipment’, respectively. Again, this shows how factors that warrant further explanation can be placed in context by considering the multi-combinatorial categorisation. It also shows, however, that descriptive statistics neglecting this facet of analysis shed a slightly misleading light on the collected incident data.
Figure 2 Reason's Organisational Accident Model [Stanhope, 1997 #57]
The crucial role of detection factors is being recognised in the Edinburgh scheme. This is reflected on the incident form, as well as in the conclusions that are drawn form the data (see Table 1). Not only needs to be observed that although humans cause incidents, it is also humans who detect it and either remedy the consequences or prohibit the course of the incident to proceed. Staff is encouraged specifically to note which factors are believed to have aided detection. This is not only the data that can with significant confidence be assumed to be the reporter's own experience, but also that what ultimately can assist in finding ways of reducing the number of incidents and consequently, accidents.
The detection factor taxonomy evolved alongside the iterative development of the contributory factors taxonomy. Initially it consisted of the categories ‘Repeated Regular Checking’, ‘Presence of Alarms on Equipment’, ‘Presence of Experienced Staff’, ‘Hearing Unfamiliar Noise’, ‘Patient Noticed’, and ‘Relative Noticed’. A task specific factor ‘Having Lines or Three Way Tap Visible’ was added, as well as the factor ‘Handover Check’. These added factors point to possible system improvements to facilitate detection. They can be actively influenced by system factors such as work procedures, design, training, and staffing levels.
This is pointed out in the initial presentation of the incident scheme . It states that "regular checking by experienced staff is critical in detecting errors, but this may be adversely affected by nurse staffing policies where agency staff are commonly used or where little time is available for handovers".
The iteration over the collected incident data thus clarified two more detection facilitating conditions. The importance of handover checks to make up for contributing causes ‘Failure to Check Equipment’ and ‘Failure to Perform Hourly Check’ is pointed out. Without iterative revision and coding of the data categorisation, these factors might have gone neglected. A formalised framework of analysis, such as those used in the aviation domain, can aid the recognition of detection factors and the generation of suggested actions (see Section 5 and 6).
In many medical incident reporting schemes, in-depth analysis and a search for the root causes of adverse events does not take place ; . In contrast, the formal investigation of adverse events in industry is an increasingly well-established concept. Studies of accidents in industry, transport and military spheres have led to a much broader understanding of accident causation, with less focus on the individual who makes the error, and more on pre-existing organisational factors.
For instance, Reason's organisational accident causation model (see Figure 2) shows how latent failures can pave the way for an accident to occur. To produce effective recommendations, the information collected must be analyzed in a way that reveals the relationships between the human error that occurred, the design, and characteristics of the systems.
The benefits of analysing a case using a formalised model are that it allows an analysis in a structured format based on theories of accident causation and human error. This type of analysis allows analysts not only to identify the active failures, which they are accustomed to do, but also the potentially more important latent failures which create the conditions in which people make errors. suggested that a more systematic approach dealing with a smaller number of cases in more depth is likely to yield greater dividends in understanding incident causation and generating action recommendation than the ‘many’ cases currently analysed quite briefly and hence less effectively.
present a seminal paper on NASA's accident investigation method. Combined with Reason's accident causation model , it can be described as consisting of three steps:
Therefore, this model extends the NTSB model by including the analysis of the "why" of the incident. The incident analysis proceeds from identification of ‘active failures’ and local working conditions that precipitate those to the identification of latent system failures. For instance, Reason (1990) has proposed a list of General Failure Types of organizations. They include: incompatible goals; organizational deficiencies; inadequate communications; poor planning; inadequate control and monitoring; design failures; inadequate defenses; unsuitable materials; poor procedures; poor training; inadequate maintenance management; and inadequate regulation.
Root Cause Analysis Methods
'Formalised root cause analysis methods can be used within incident and accident investigation in order to guide the process of arriving at more distal causal factors precipitating the occurrence. One example is the Management Oversight and Risk Tree (MORT) Analysis method. The US Department of Energy developed this root cause analysis method for the investigation of e.g. Nuclear Power Station incidents. It defines a ‘direct cause’ of an accident as the immediate events or conditions that caused the accident. ‘Contributing causes’ are said to be events or conditions that collectively with other causes increase the likelihood of an accident but that individually did not cause the accident. Contributing causes may be based on longstanding conditions or a series of prior events that, while not important in and of themselves, collectively increased the probability that an accident would occur. ‘Root causes’ are the causal factors that, if corrected, would prevent recurrence of the accident. Root causes are derived from and generally encompass several contributing causes. They are higher-order, fundamental causal factors that address classes of deficiencies, rather than single problems or faults. These are identified using root cause analysis. Root causes can include system deficiencies, management failures, inadequate competencies, accepted risks, performance errors, omissions, non-adherence to procedures, and inadequate organisational communication. In the advanced root cause analysis method MORT , these are listed in a tree-shaped diagram, which guides the collection, interpretation, and analysis of the data. Thus, it provides a ‘tool for thought’, a structural framework, and a documentation tool for causal chain analysis. The final aim is identify management weaknesses, or in Reason's terminology, latent organisational failures.
The Edinburgh Study: Incident Analysis
The narrative given by the reporting staff on the incident report form provides the first level of interpretation of what happened. In the case of the person reporting the incident not being the same as the one having ‘caused’ it, the reporter provides a second-level interpretation of the events.
The narrative, together with the contributory and detection factors mentioned, typically lays out a timeline of the events. Otherwise, this is inferred when analysing the data. One method of establishing a task-related timeline is to embed the erroneous task event into a sequential, high-level task model. This is partly carried out by classifying occurrences according to task aspects, as shown in Section 4.
The classification of events involves informal (and non-documented) analysis followed by the above noted categorisation into ‘causes’, contributory, and detection factors. This can be seen as representing an informal root cause analysis process (see below). However, to repeat, the categorised data often seems to present behavioural descriptions or proximal ‘causes’.
Following Reason's accident causation and analysis model (see Figure 2), Table 2 shows a tentative classification of the provided ‘contributing factors’ categories into latent failure types (distal causal factor), work conditions failure types (distal causal factor), and active failures (proximal causal factor). The latter constitute in our case task and behaviour oriented categories, rather than the error types based on cognitive theory as suggested by Reason. This classification can be compared to Rasmussen's event description scheme (see Figure 1), where latent factors and work conditions are mirrored in ‘Personnel Task’, and the ‘Causes of Human Malfunction’, ‘Situation Factors’, and ‘Factors Affecting Performance’ respectively. In order to be able to categorise the proximal failure types into Reason's cognitive Error Types, more detailed incident data is required than could be accessed from the samples. There is no one to one relationship between contributing factors and their underlying cognitive mechanisms (see for instance ), and a more in-depth analysis of the error types is required.
Contributing Factor Categories
Table 2 - Failure Type Categorisation
Currently, the Edinburgh study does not proceed much beyond the "what" phase within the above mentioned analysis model. However, given the contributing factors classification above, root cause analysis can be used to reflect the variety of levels in the incident causation tree. The consideration of latent and work condition factors draws attention to the deficiency of single-cause categorisation. Multi-causal categorisation can be used to reconstruct a possible root cause analysis as an example. For instance, the above-mentioned combination of factors ‘inexperience with equipment’, ‘poor equipment design’, and ‘lack of suitable equipment’ can be illustrated in a causal tree as shown in Figure 3. The incident (e.g. ventilator related) was ‘caused’ by staff ‘inexperience with equipment’. This is then hypothesised to be mediated by ‘lack of suitable equipment’, which in turn pointed to ‘poor equipment design’. Thus, a causal chain is established, formalised, and documented. It could also be argued that ‘lack of suitable equipment’ contributed directly to the occurrence of the incident. This hypothesis, again, has different implication for potential system redesign. Thus, this kind of analysis, taking several levels of causation into account, can aid precise and structured reasoning about the incident occurrence. Factors to be considered in lower levels of causation are work conditions and latent system failure types, for instance Training, and alternative contributing factors why a failure to check equipment occurred.
A structured, formalised analysis framework is also necessary to prevent hindsight from biasing error analysis. Attribution of error is a social and psychological judgement process rather than a matter of objective fact. Hindight view is fundamentally flawed because it does not reflect the situation confronting the practitioners at the scene. Thus, rather than being a causal category, human error should be seen as representing a symptom, and a starting point for investigation .
In order to arrive at sound and relevant action recommendations, a systematic and structured way of bridging the result of the analysis process to remedial measures is needed. The documentation of this process plays an important role, for instance to allow monitoring of the effect of the measure.
In industry domains such as aviation and process control, cognitive analysis of error occurrences is often used to point towards remedial actions. For instance, Reece et al. investigated human error in radiation exposure events. The proximal cause to the incident was situated in the task sequence and, additionally, cognitive failure analysis was carried out. Then the relationship between them was analysed, and thus it could be identified
In a similar vein, van der Schaaf proposed the Eindhoven Classification Scheme for classifying events and identifying incident causes in process control. The main categories represent Technical Factors, Organisational Factors, and Human Error categorised according to Rasmussen's Skills Rules and Knowledge (SRK) framework. The translation into proposals for effective, preventive, and corrective action can then be guided by means of a proposed Classification/Action Matrix. Thus, the action categories relate back to the SRK error types, and include Equipment, Procedures, Information & Communication, Training, and Motivation.
This shows how error categorisation, when done according to cognitive level of performance and latent factors, can provide the basis for sound, structured, and theory-based remedial recommendations. Without error categories being based on sound psychological theory, systematic and relevant action recommendation generation is not possible.
Figure 3 – Example of Incident Root Cause Analysis
The Edinburgh Study: Action Recommendation
In the Edinburgh study, the incident data was categorised and summarised by the scheme manager. Action recommendations were arrived at in an iterative process. The scheme manager suggested remedial actions and presented those together with the summary data to the senior nurse of the ICU. Together, the data was discussed and the rationale for the action recommendations reviewed. This led to a final version of suggested actions for each incident analysis period. Table 3 and 4 show a categorisation of the suggested actions for our two samples (sample89 and sample98).
First, we related the recommendations back to the initial ‘cause’ categories, with the scheme manager's assistance. Then we re-interpreted the suggested actions in the light of system safety concepts, such as presented by van der Schaaf or Reason. In sample89, Thoughtlessness and Poor Equipment Design featured most often as contributing factors. This is mirrored in the action recommendations falling in the ‘remind staff…’, ‘change equipment’ and ‘create protocol for equipment use’. Entries under ‘remind staff…’ typically are in the form of a reminder statement, drawing attention to problematic task or equipment characteristics, for instance "Remind all staff of the importance of careful, correct use of 3-way taps on central venous and arterial lines" (February 1989). ‘Change equipment’ is represented by recommendations such as "Particular sort of disposable ventilator tubing used on trial should no longer be used". ‘Create protocol for equipment use’ mentioned for instance "Consider use of small Graseby syringe drivers with smaller volumes of solution".
Vasc. Line 6
Disp. Equip. 4
Non-disp. Equip. 2
Poor Equip. Design: 11
Lack of Suitable
Night Time: 3
Unit Busy: 2
Failure to Perform
Hourly Check: 2
Reg. Checking: 11
Exp. Staff: 8
Pat. Noticed: 1
3 Vascular Line
3 Remind Staff…
2 change equipment
2 create protocol for
2 review protocol for
1 create protocol for
1 review equipment
Table 3- Action Recommendations for the reporting period Jan/Feb 1989
Incident ‘Cause’ 98
Action Recommendations 98
Vasc. Line 4
Non-disp. Equip 1
Poor Communication 8
Inexperience with Equipment: 4
Night Time: 3
Failure to Check Equipment: 3
Failure to Perform Hourly Check: 2
End. Tube not
Properly Secured: 2
Poor Equip. Design: 1
Patient Inadequately Sedated: 1
Turning the patient: 1
Reg. Checking: 9
Exp. Staff: 8
Unfamiliar Noise: 1
Patient Noticed: 1
Handover Check: 1
2 Vascular line
4 Remind Staff…
1 training viz new equipment
1 equipment maintenance
1 create protocol for equip. use
1 review procedure viz home
Table 4 - Action Recommendations for the reporting period May/Nov 1998
In the period of May to November 1998, a marked increase in reminder statements can be noted. Following inspection of recommendation data, the dissemination of reminder statements were noted to be the single most often suggested action. In the period August 1995 to August November 1998, 82 "Remind Staff…" statements out of a total number of 111 recommendations coud be noted. The 29 other recommendations concerned procedure creation or change suggestions (e.g. "produce guidelines for care of arterial lines - particularly for femoral artery lines post coiling" ), or were equipment related (e.g. "Obtain spare helium cylinder for aortic pump to be kept in ICU").
Reminder statements as potential error prevention mechanism have come into disrepute in domains such as aviation . Rather than further burdening operators’ and pilots’ memory capacity, indirect safety methods such as reduced complexity, standardisation, proceduralisation, and work aids such as checklists have been introduced.
However, on closer inspection, the nature of the Edinburgh reminder statements proves to be interesting. Reminders seem to target either very common, but still error-prone, details of tasks and practices, or problem points that occur very infrequently, or otherwise problematic parts or uses of procedures.
Distinguishing thus between types of recommendation, a link to Rasmussen's SRK framework can be created. Skill (S) level performance concerns automatic behaviour routines, such as the common but still error prone details of tasks. Rule (R) level performance concerns the conscious but practiced following of procedures and protocols, and Knowledge-based (K) performance relates to potentially effortful, fully conscious problem solving and decision making. In aviation and process control, it has been realised that performance on the rule-based level is the least error-prone. Therefore, design methodologies such as Ecological Interface Design (EID, ) emphasise proceduralies tasks, and ensure that task features that relate to S or K level performance are assisted accordingly. For instance, K based performance can be supported by careful information design.
The Eindhoven classification/action matrix is also based on a SRK style cognitive classification of error. It details, as described above, that R-based error is best targeted with with Training measures, K-based error with improvements in the Information & Communication domain, and S-based error with change in equipment. Thus, this is at odds with the Edinburgh results of the recommendation generation. Instead of reacting with reminder statements indiscriminately of cognitive performance level, these could be taken into account when suggesting remedial actions. The categorisation of error according to cognitive mechanisms will also further the understanding of performance problems.
We have illustrated in this paper how methods and insights from safety-critical domains other than medicine can be applied in a clinical setting. This concerns the use of incident schemes, as well as the application of accident causation models in the analysis of incidents, and in the generation of action recommendations. These recognise the importance of latent failure as part of the causal chain of an incident, which is reflected in the incident analysis process.
Incident investigation schemes often neglect formalised, in-depth anlaysis of single incidents in favour of a quantitative surface analysis. Also, the crucial role of detection factors and the need to support those is often underestimated. The Edinburgh incident scheme caters for those in the data collection process as well as in the generation of action recommendations. Thus, the analysis process and its results of the Edinburgh study showed how not only theoretical ‘top-down’ approaches can inform incident analysis, but also how practical incident avoidance can be supported by a ‘bottom-up’, detailed (albeit non-formalised) analysis process.
This work was supported by UK EPSRC Grant No GR/L27800. Thanks goes to Dr David Wright for data access and his support, and also to the Glasgow Accident Analysis Group.
Daniela Busse, 17 Lilybank Gardens, Glasgow G20 8RZ, Scotland UK, Telephone +44 141 3398855 x0918, email email@example.com.
Daniela Busse (MA) is a Research Assistant and a PhD student at the Computing Department, University of Glasgow. Her thesis concerns Cognitive Models in Human Error and Accident Analysis. She received her MA in Computing and Psychology from the University of Glasgow.
Prof. C.W. Johnson, DPhil, MSc, MA, FBCS, CEng., Dept. of Computing Science, Univ. of Glasgow, Glasgow, G12 8QQ, Scotland. Tel: +44 (141) 330 6053. Fax: +44 (141) 330 4913, e-mail:firstname.lastname@example.org.
Chris Johnson leads an accident investigation group that includes human factors experts; software developers and systems engineers. He chairs IFIP Working Group 13.5 on Human Error and Systems Development and has authored over ninety papers on the human contribution to systems failure.