Forensic Software Engineering?

Forensic Software Engineering: Are Software Failures Symptomatic of Systemic Problems?

Chris Johnson

Department of Computing Science, University of Glasgow, Glasgow, G12 8QQ, UK.

Tel: +44 (0141) 330 6053 Fax: +44 (0141) 330 4913

http://www.dcs.gla.ac.uk/~johnson, EMail: johnson@dcs.gla.ac.uk

There is a growing realization that existing accident investigation techniques fail to meet the challenges created by incidents that involve software failures. Existing software development techniques cannot easily be used to provide retrospective information about the complex and systemic causes of major accidents. This paper, therefore, argues that we must develop specific techniques to support forensic software engineering. It is important that these techniques should look beyond ‘programmer error’ as a primary cause of software failure. They must enable investigators to identify the systemic problems that are created by inadequate investment, by poor management leadership and by the breakdown in communication between development teams. This argument builds on previous work by Leveson and by Reason. They have focused on the importance of a systemic approach to the development of safety-critical applications. Relatively little attention has been paid to a systemic analysis of their failure. Later sections of this paper analyze the potential problems that can arise when a systemic approach is extended from systems development to accident investigation.

1. Introduction

The Rand report into the "personnel and parties" in National Transportation Safety Board (NTSB) aviation accident investigations argues that existing techniques fail to meet the challenges created by modern systems:

"As complexity grows, hidden design or equipment defects are problems of increasing concern. More and more, aircraft functions rely on software, and electronic systems are replacing many mechanical components. Accidents involving complex events multiply the number of potential failure scenarios and present investigators with new failure modes. The NTSB must be prepared to meet the challenges that the rapid growth in systems complexity poses by developing new investigative practices." (Lebow, Sarsfield, Stanley, Ettedgui and Henning, 1999)

The Rand report reveals how little we know about how to effectively investigate and report upon the growing catalogue of software induced failures. By software "induced" accidents we include incidents that stem from software that fails to perform an intended function. We also include failures in which those intended functions were themselves incorrectly elicited and specified. The following pages support this argument by evidence from accident investigations in several different safety-related industries (US Department of Health and Human Services, 1998). These case studies have been chosen to illustrate failures at many different stages of the software development lifecycle. As we shall see, the recent NTSB investigation into the Guam crash has identified a number of problems in requirements capture for Air Traffic Management systems (NTSB, 2000). Software implementation failures have been identified as one of the causal factors behind the well-publicised Therac-25 incidents (Leveson, 1995). The Lyons (1996) report found that testing failures were a primary cause of the Ariane-5 accident. The South-West Thames Regional Health Authority identified software procurement problems as contributory factors in the failure of the London Ambulance Computer Aided Dispatch system (South-West Thames Regional Health Authority, 1993). A further motivation was that all of these incidents stem from more complex systemic failures that cross many different stages of the software lifecycle.

2. Problem of Supporting Systemic Approaches to Software Failure

It can be argued that there is no need to develop specific forensic techniques to represent and reason about software "induced" accidents. Many existing techniques, from formal methods through to UML, can be used to analyze the technical causes of software failure (Johnson, 1999). For instance, theorem proving can be used to establish that an accident can occur given a formal model of the software being examined and a set of pre-conditions/assumptions about the environment in which it will execute (Johnson, 2000). If an accident cannot be proven to have occurred using the formal model then either the specification is wrong or the environmental observations are incorrect or there are weaknesses in the theorem provide techniques that are being applied. Unfortunately, there are many special characteristics of accidents that prevent such techniques from being effective applied. For example, there are often several different ways in which software might have contributed to an accident. Finding one failure path, using formal proof, symbolic execution or control flow analysis will not be sufficient to identify all possible causes of failure (Ladkin and Loer, 1998). There are some well-known technical solutions to these problems. For instance, model checking can be used to increase an analyst’s assurance that they have identified multiple routes to a hazardous state. These techniques have been applied to support the development of a number of complex software systems. However, they have not so far been used to support the analysis of complex, software-induced accidents (Rushby, 1999).

There are a number of more theoretical problems that must be addressed before standard software engineering techniques can be applied to support accident investigation. Many development tools address the problems of software complexity by focussing on particular properties of sub-components. As a result, they provide relatively little support for the analysis of what has been termed "systemic" failure (Johnson, 2000). The nature of such failures is illustrated by the NTSB’s preliminary report into the Guam accident:

"The National Transportation Safety Board determines that the probable cause of this accident was the captain's failure to adequately brief and execute the non-precision approach and the first officer's and flight engineer's failure to effectively monitor and cross-check the captain's execution of the approach. Contributing to these failures were the captain's fatigue and Korean Air's inadequate flight crew training. Contributing to the accident was the Federal Aviation Administration's intentional inhibition of the minimum safe altitude warning system and the agency's failure to adequately to manage the system." (Probable Causes, Page 3, NTSB, 2000).

It is unclear how existing software engineering techniques might represent and reason about the Captain’s fatigue and the inadequate briefings that left the crew vulnerable to the failure of air traffic control software. Such analyses depend upon the integration of software engineering techniques into other complementary forms of analysis that consider human factors as well as organizational and systems engineering issues. There are a number of requirements engineering techniques that come close to considering the impact that these diverse systemic factors have upon systems development. Finkelstein, Kramer and Nuseibeh’s (1993) viewpoint-oriented approaches are a notable example. However, existing requirement analysis techniques tend to focus on the generic impact of management and organizational structures on future software systems. They provide little or no support for situated analysis of the reasons why a specific piece of software failed on a particular day under specific operating conditions.

3. Problems of Framing Any Analysis of Software Failure

The problems of identifying multiple systemic causes of failure are exacerbated by the lack of any clear "stopping rule" for accident investigations that involve software failures. This problem is particularly acute because many different causal factors contribute to software "induced" accidents. For example, at one level a failure can be caused because error-handling routines failed to deal with a particular condition. At another level, however, analysts might argue that the fault lay with the code that initially generated the exception. Both of these problems might, in turn, be associated with poor testing or flawed requirements capture. Questions can also be asked about the quality of training that programmers and designers receive. These different levels of causal analysis stretch back to operational management and to the contractors and sub-contractors who develop and maintain software systems. Beyond that investigators can focus on the advice that regulatory agencies provide for suitable development practices in safety related systems (US Department of Health and Human Services, 1998). This multi-level analysis of the causes of software failure has a number of important consequences for accident analysis. The first is that existing software engineering techniques are heavily biased towards a small section of this spectrum. For example, Software Fault Trees provide good support for the analysis of coding failures (Leveson, Cha and Shimeall, 1991). Requirements analysis techniques can help trace software failures back to problems in the initial stages of development (Finkelstein, Kramer and Nuseibeh, 1993). However, there has been little work into how different management practices contribute to, or compound, failures at more than one of these levels (Reason, 1998).

The Therac-25 incidents provide one of the best-known examples of the problems that arise when attempting to frame any analysis of software failure (Leveson, 1995). It is instructive, however, that many software engineers remember this incident purely for the initial scheduling problems rather than the subsequent inadequacies of the bug fixes:

"in general, it is a mistake to patch just one causal factor (such as the software) and assume that future accidents will be eliminated. Accidents are unlikely to occur in exactly the same way again. If we patch only the symptoms and ignore the deeper underlying cause of one accident, we are unlikely to have much effect on future accidents. The series of accidents involving the Therac-25 is a good example of exactly this problem: Fixing each individual software flaw as it was found did not solve the safety problems of the device" (page 551, Leveson, 1995).

A range of different approaches might, therefore, be recruited to identify the many different causal factors that contribute to major software failures. Such an approach builds on the way in which standards, such as IEC61508 and DO-178B, advocate the use of different techniques to address different development issues. There are, however, several objections to this ad hoc approach to the investigation of software induced accidents. The most persuasive is Lekburg’s (1997) analysis of the biases amongst incident investigators. Analysts select those tools with which they are most familiar. They are also most likely to finding the causal factors that are best identified using those tools. In the case of software engineering, this might result in analysts identifying those causal factors that are most easily identified using formal methods irrespective of whether or not those causal factors played a significant role in the course of the accident. A more cynical interpretation might observe that particular techniques might be selectively deployed to arrive at particular conclusions. In either case, the lack of national and international guidance on the analysis of software failures creates the opportunity for individual and corporate bias to influence the investigation of major accidents.

4. Problems of Assessing Intention in Software Development

It is not enough for analysts simply to document the requirements failures or the erroneous instructions or the inadequate test procedures that contribute to software "induced" accidents. They must also determine the reasons WHY software failed. Why was a necessary requirement omitted? Why was an incorrect instruction introduced? Why was testing inadequate? For instance, the Lyons report spends several pages considering the reasons why the inertial reference system (SRI) was not fully tested before Ariane flight 501:

"When the project test philosophy was defined, the importance of having the SRI’s in the loop was recognized and a decision was made (to incorporate them in the test). At a later stage of the programme (in 1992), this decision was changed. It was decided not to have the actual SRI’s in the loop for the following reasons: the SRIs should be considered to be fully qualified at equipment level; the precision of the navigation software in the on-board computer depends critically on the precision of the SRI measurements. In the Functional Simulation Facility (ISF), this precision could not be achieved by electronics creating test signals; the simulation of failure modes is not possible with real equipment, but only with a model; the base period of the SRI is 1 millisecond whilst that of the simulation at the ISF is 6 milliseconds. This adds to the complexity of the interfacing electronics and may further reduce the precision of the simulation" (page 9,Lyons, 1996)."

Leveson’s (2000) recent work on intent specifications provides significant support for these forensic investigations of software failure. She argues that there will be significant long-term benefits for team-based development if specifications supported wider questions about the reasons why certain approaches were adopted. For instance, programmers joining a team or maintaining software can not only see what was done, they can also see why it was done. This approach is an extension of safety case techniques. Rather than supporting external certification, intent specifications directly support software development within an organization. Accident investigators might also use these intent specifications to understand the reasons why software failures contribute to particular incidents. Any forensic application of Leveson’s ideas would depend upon companies adopting intent specifications throughout their software lifecycle. For example, maintenance is often a contributory factor in software induced accidents. Intent specifications would have to explain the reasons why any changes were made. This would entail significant overheads in addition to the costs associated with maintaining safety cases for external certification (Kelly and McDermid, 1998). However, it is equally important not to underestimate the benefits that might accrue from these activities. Not only might they help accident investigators understand the justifications for particular development decisions, they can also help to establish a closer relationship between the implemented software and the documented design. The report into the failure of the London Ambulance Computer-Aided Dispatch System emphasizes the problems that can arise without these more formal documentation practices:

"Strong project management might also have minimised another difficulty experienced by the development. SO, in their eagerness to please users, often put through software changes "on the fly" thus circumventing the official Project Issue Report (PIR) procedures whereby all such changes should be controlled. These "on the fly" changes also reduced the effectiveness of the testing procedures as previously tested software would be amended without the knowledge of the project group. Such changes could, and did, introduce further bugs." (paragraph 3082, South-West Thames Regional Health Authority, 1993).

Many industries already have certification procedures for software maintenance. This helps to avoid the ad hoc procedures described in the previous quotation. Safety cases go part of the way towards the intent specifications that are proposed by Leveson. However, there is little room for complacency. Kelly and McDermid argue that many companies experience great difficulties in maintaining their software safety cases in the face of new requirements or changing environmental circumstance (Kelly and McDermid, 1998). As a result there is no documented justification for many of the decisions and actions that lead to software failure. These have to be inferred by investigators in the aftermath of major accidents when a mass of ethical and legal factors make it particularly difficult to assess the motivations that lie behind key development decisions.

5. Problems of Assessing Human and Environmental Factors

Simulation is an important tool in many accident investigations. For example, several hypotheses about the sinking of the MV Estonia were dismissed through testing models in a specially adapted tank. Unfortunately, accident investigators must often account for software behaviors in circumstances that cannot easily be recreated. The same physical laws that convinced the sub-contractors not to test the Ariane 5’s inertial reference systems in the Functional Simulation Facility also frustrate attempts to simulate the accident (Lyons, 1996). The difficulty of recreating the conditions that lead to software failures has important implications for the reporting of software induced accidents. Readers must often rely upon the interpretation and analysis of domain experts. Unfortunately, given the lack of agreed techniques in this area, there are few objective techniques that can be used to assess the work of these experts. Given the complexity of the coding involved and the proprietary nature of many applications, accident reports often provide insufficient details about the technical causes of software failure. As a result, readers must trust the interpretation of the board of inquiry. This contrasts strongly with the technical documentation that often accompanies reports into other forms of engineering failure. It also has important implications for teaching and training where students are expected to follow vague criticisms about the "dangers of re-use" rather than the more detail expositions that are provided for metallurgic failures and unanticipated chemical reactions.

The interactive nature of many safety-critical applications also complicates the simulation of software "induced" accidents. It can be difficult to recreate the personal and group factors that lead individuals to act in particular ways. It can also be difficult to recreate the ways in which user interface problems exacerbate flaws in the underlying software engineering of safety-critical applications. For example, the London Ambulance system required "almost perfect location information" (South-West Thames Regional Health Authority, 1993). As the demands on the system rose, the location information became increasingly out of date and a number of error messages were generated. These error messages are termed "exceptions" in the following quotation. The rising number of error messages increased the users’ frustration with the software. As a result, the operators became less and less inclined to update essential location and status information. This, in turn, led to more error messages and a "vicious cycle" developed. Accident analysts must, therefore, account both for the technical flaws in any software system but also for emergent properties that stem from the users’ interaction with their system:

"The situation was made worse as unrectified exception messages generated more exception messages. With the increasing number of "awaiting attention" and exception messages it became increasingly easy to fail to attend to messages that had scrolled off the top of the screen. Failing to attend to these messages arguably would have been less likely in a "paper-based" environment." (Paragraph 4023, South-West Thames Regional Health Authority, 1993)

It is not always so easy to understand the ways in which human behavior contributes to the failure of computer based systems. This is a complex topic in its own right. Behavioral observations of interaction provide relatively little information about WHY individuals use software in particular ways. It is also notoriously difficult to apply existing human error modeling techniques to represent and reason about the mass of contextual factors that affect operator performance during a major accident (Johnson, 1999a). The London Ambulance report provides a further illustration of these problems. There were persistent rumors and allegations about sabotage contributing to the failure of the software. Accident investigators could never prove these allegations because it was difficult to distinguish instances of deliberate "neglect" from more general installation problems.

6. Problems of Making Adequate Recommendations

Previous paragraphs have argued that accident investigators must address the systemic factors that contribute to and combine with software failures during major failures. They must also consider the scope of their analysis; software failures are often a symptom of poor training and management. It can also be difficult to identify the motivations and intentions that lead to inadequate requirements, "erroneous" coding and poor testing. Finally, we have argued that it can be difficult to determine the ways in which human factors and environmental influences compound the problems created by software failures in major accidents. There are also a number of further problems. In particular, it can be difficult for accident investigators to identify suitable recommendations for the design and operation of future software systems. This is, in part, a natural consequence of an increasing emphasis being placed upon process improvement as a determinant of software quality. Once an accident occurs, this throws doubt not only on the code that led to the failure but also on the entire development process that produced that code. At best, the entire program may be untrustworthy. At worst, all of the other code cut by that team or by any other teams practicing the same development techniques may be under suspicion. Readers can obtain a flavor of this in the closing pages of the Lyons’ report into the Ariane 5 failure. The development teams must:

"Review all flight software (including embedded software), and in particular:

Identify all implicit assumptions made by the code and its justification documents on the values of quantities provided by the equipment. Check these assumptions against the restrictions on use of the equipment."

(Paragraph R5, Lyons, 1996).

This citation re-iterates the importance of justification and of intent, mentioned in previous paragraphs. It also contains the recommendation that the must identify "all implicit assumptions made by their code". Unfortunately, it does not suggest any tools or techniques that might be used to support this difficult task. In preparing this paper, I have also been struck by comments that reveal how little many investigators appreciate about the problems involved in software development. This is illustrated by a citation from the report into the London Ambulance Computer Aided Dispatch system.

"A critical system such as this, as pointed out earlier, amongst other prerequisites must have totally reliable software. This implies that quality assurance procedures must be formalised and extensive. Although Systems Options Ltd (SO) had a part-time QA resource it was clearly not fully effective and, more importantly, not independent. QA in a project such as this must have considerable power including the ability to extend project time-scales if quality standards are not being met. This formalised QA did not exist at any time during the Computer Aided Despatch development. (Paragraph 3083, South-West Thames Regional Health Authority, 1993).

It is impossible by any objective measures to achieve total software reliability, contrary to what is suggested in the previous paragraph. It may be politically expedient to propose this as a valid objective. However, to suggest that this is a possible is to completely misrepresent the state of the art in safety-critical software engineering.

7. Conclusion and Further Work

A number of agencies have argued that existing techniques cannot easily be used to investigate accidents that involve the failure of software systems (Lebowet al 1999, US Department of Health and Human Services 1998). This paper has, therefore, gone beyond the high level analysis presented in previous studies to focus on the challenges that must be addressed by forensic software engineering:

There are no existing techniques that enable analysts to represent and reason about the systemic factors that stem from and lead to software failures.

There are no agreed means of framing the scope of accident investigations that involve software. This results in considerable differences in the quality of many reports. Some focus on individual programmer error and even on differences in programming "style". Others ignore these issues and focus on managerial and regulatory supervision. Very few consider the interaction between these different contributory factors to software failure.

The lack of guidance about appropriate analytical tools opens up the opportunity for subjective bias and major problems in recreating the analytical techniques that support particular conclusions.

Software re-use creates particular problems in the aftermath of an accident because investigators may be forced to question all of the assumptions that were made about the safety of any modification to a new environment or platform.

Accident investigators are concerned to understand not just how a program failed but also WHY that failure went undetected during subsequent stages of the software life cycle. This makes it increasingly important that intentional techniques and safety cases are better integrated into all software development practices.

It is impossible to run empirical tests or to simulate the operating conditions that lead to many software failures. This makes it imperative that analysts are explicit about the techniques that they use, and assumptions that they make, during the analysis of major software failures. Other professionals must be able to assess the validity of their findings.

The lack of integration between human factors and software engineering techniques makes it difficult to identify the ways in which "emergent behaviors" can lead to failure (Johnson, 1999a). As a result, we have learnt remarkably little about the nature of the human computer interaction during major accidents.

The lack of objective measures of software quality has led to a focus on development practices. As a result, the occurrence of even a single software failure can throw doubt upon an entire system. This makes it difficult for accident investigators to limit the potential scope of the recommendations in an accident report.

The greatest challenge for forensic software engineering is to educate other investigators, and ultimately the general public, about the nature of safety critical systems. Until this issue is addressed then we will continue to read accident reports that urge companies to develop "completely reliable software".

It should be stressed that this is a partial list. Additional factors complicate the analysis of software induced failures. It is also important to stress that this paper has not proposed any detailed solutions to the problems of assessing the role that software plays in major incidents and accidents. The lack of previous research in this area makes such proposals premature. However, it is possible to suggest directions for future research. For instance, many of the issues in the previous list are addressed by recent work on intentional forms of software engineering (Leveson, 1995). As mentioned, these not only specify what a system is intended to do but why that requirement is important. These approaches help investigators to distinguish between failures to achieve appropriate intentions, through poor coding, and the more fundamental problems that stem from inappropriate intentions. A second area for research focuses on the maintenance of safety cases and other design documents following software re-use (Kelly and McDermid, 1998). Previous sections have argued that accident investigators must often piece together the reasons why software "failed" within new contexts of use, such as new hardware platforms or control environments. An analysis of software failures can provide insights about the threats that software re-use can create for existing safety-cases. Such an analysis might not only guide subsequent accident investigations but might also provide valuable guidelines for engineers and developers who want to re-use safety-critical software.

Jim Hall, the president of the US National Transportation Safety Board recently announced the foundation of an accident investigation academy. This is motivated by the criticisms of the Rand report that were cited at the start of this paper. The academy will train investigators to better assess the contribution that software failures make to major incidents. Perhaps the greatest indictment of our research is that we can offer relatively little practical advice about the curriculum that they should adopt. The thousands of papers that have been published on the constructive design of complex safety-critical software far outweigh the handful of papers that have been published on the analysis of software in major incidents and accidents.

8. Epilogue: Some Lingering Doubts

This paper has argued that forensic software engineering techniques must be developed to exploit a more systemic approach to the analysis of software-related failures. This draws upon a number of recent initiatives within the field of software engineering (Leveson, 1995) and systems development (Reason, 1998). Recent attempts to develop such an approach have, however, identified a number of concerns (Johnson, in press). These doubts can be illustrated by the software problems that contributed to the loss of NASA’s Mars Surveyor'98 program (NASA, 2000, 2000a). This consisted of the Mars Climate Orbiter and the Mars Polar Lander. Both missions were to satisfy tight financial constraints by exploiting innovative technology under NASA's faster, better, cheaper management initiative (NASA, 2000). The Mars Climate Orbiter investigation team describes how a navigation error was introduced when software exploited Imperial rather than the recommended metric units to represent thruster performance:

"Angular Momentum Desaturation (AMD) events occurred 10-14 times more often than was expected by the operations navigation team. This was because the solar array was asymmetrical relative to the spacecraft body ... this increased the Sun-induced momentum buildup on the spacecraft. The increased AMD events coupled with the fact that the angular momentum (impulse) data was in English, rather than metric, units, resulted in small errors being introduced in the trajectory estimate over the course of the 9-month journey. At the time of Mars insertion, the spacecraft trajectory was approximately 170 kilometers lower than planned. As a result, MCO either was destroyed in the atmosphere or re-entered heliocentric space after leaving Mars atmosphere." (NASA, 2000)

The Mars Polar Lander was launched approximately three months after the loss of the Climate Orbiter (NASA, 2000a). It completed a cruise phase and correctly assumed entry attitude. A development decision had previously determined that telemetry data would not be collected during the entry, descent and landing phase. In consequence, the change in attitude had the effect of pointing the antenna away from Earth and the signal was lost, as expected. No subsequent communications were received from the Polar Lander:

"…the probable cause of the loss of MPL has been traced to premature shutdown of the descent engines, resulting from a vulnerability of the software to transient signals. Owing to the lack of data, other potential failure modes cannot positively be ruled out. Nonetheless, the Board judges there to be little doubt about the probable cause of loss of the mission." (NASA, 2000a)

The inquiries into the Mars Surveyor'98 program are instructive because they deliberately exploited a more systemic approach to software failure than is apparent in any of the other accident investigations cited in this paper. They looked at the long working hours and communications problems between NASA and the contractors that might have contributed to particular software failures. They considered the pressure to meet mission deadlines that may have compromised risk assessment and validation procedures. This more systemic view is illustrated by the words of Daniel Goldin; the NASA Administrator who first formulated the Faster, Better, Cheaper strategy. He spoke to the engineers and managers at the Jet Propulsion Laboratory about the loss of the Climate Orbiter and the Polar Lander.

"I told them that in my effort to empower people, I pushed too hard... and in so doing, stretched the system too thin. It wasn't intentional. It wasn't malicious. I believed in the vision... but it may have made failure inevitable. I wanted to demonstrate to the world that we could do things much better than anyone else. And you delivered -- you delivered with Mars Pathfinder... With Mars Global Surveyor... With Deep Space 1. We pushed the boundaries like never before... and had not yet reached what we thought was the limit. Not until Mars 98. I salute that team's courage and conviction. And make no mistake: they need not apologise to anyone. They did not fail alone. As the head of NASA, I accept the responsibility. If anything, the system failed them.'' (Goldin, 2000)

Goldin’s words are both encouraging and disturbing. They are encouraging because they acknowledge that the engineer’s working environment may have contributed to the loss of the missions. Goldin also acknowledges that such failures are characterized by emergent behaviors that stem from complex interactions between management practices, operational procedures and particular technologies. However, the NASA administrator’s words are also disturbing because these systemic interactions are not random occurrences. They are shaped and directed by the regulatory environment and by higher-levels of management. Goldin's words are important because they reveal his belief that the ‘system failed them’ rather than the particular management structures that helped to shape the priorities and objectives of the mission.

REFERENCES

A. Finkelstein, J. Kramer and B. Nuseibeh, Viewpoint Oriented Development: applications in composite systems. In F. Redmill and T. Anderson (eds.) Safety Critical Systems: Current Issues, Techniques and Standards, Chapman & Hall, 1993, 90-101.

D. Goldin, "When The Best Must Do Even Better" Remarks by NASA Administrator Daniel S. Goldin At the Jet Propulsion Laboratory Pasadena, CA March 29, 2000, NASA Headquarters, Washington DC, USA, http://www.hq.nasa.gov/office/pao/ftp/Goldin/00text/jpl_remarks.txt, 2000.

C.W. Johnson, A First Step Toward the Integration of Accident Reports and Constructive Design Documents. In M. Felici, K. Kanoun and A. Pasquini (eds), Proc. of SAFECOMP'99, 286-296, Springer Verlag, 1999.

C.W. Johnson, Why Human Error Analysis Fails to Support Systems Development, Interacting with Computers, (11)5:517-524, 1999a.

C.W. Johnson, Proving Properties of Accidents, Reliability Engineering and Systems Safety, (67)2:175-191, 2000.

C.W. Johnson, Incident Reporting: A Guide to the Detection, Mitigation and Resolution of Failure in Safety-Critical Systems (in press, to be published Spring 2002).

T.P. Kelly and J.A. McDermid, A Systematic Approach to Safety-Case Maintenance, M. Felici, K. Kanoun and A. Pasquini (eds.) SAFECOMP’99, LNCS 1698, Springer Verlag, 1998.

P.B. Ladkin and K. Loer, Why-Because Analysis: Formal Reasoning About Incidents. Technical Report, Technischen Fakultät der Universität Bielefeld, Bielefeld, Germany, RVS-Bk-98-01, 1998.

C.C. Lebow, L.P. Sarsfield, W.L. Stanley, E. Ettedgui and G. Henning, Safety in the Skies: Personnel and Parties in NTSB Accident Investigations. Rand Institute, Santa Monica, USA, 1999.

A.K. Lekburg, Different Approaches to Incident Investigation – How the Analyst Makes a Difference. In S. Smith and B. Lewis (eds), Proc. of the 15th International Systems Safety Conference, Washington DC, USA, August 1997.

N.G. Leveson, S.S. Cha and T.J. Shimeall, Safety Verification of Ada Programs using Software Fault Trees, IEEE Software, 8(7):48-59, July 1991.

N.G. Leveson, Safeware: System Safety and Computers, Addison Wesley, Reading Mass. 1995.

N.G. Leveson, Intent Specifications: An Approach to Building Human-Centered Specifications. Accepted for IEEE Trans. on Software Engineering 2001.

J.L. Lyons, Report of the Inquiry Board into the Failure of Flight 501 of the Ariane 5 Rocket. European Space Agency Report, Paris, July 1996

NASA, Report on Project Management in NASA: Phase II of the Mars Climate Orbiter Mishap Report, Mars Climate Orbiter, Mishap Investigation Board, NASA Headquarters, Washington DC, USA, 2000.

NASA/JPL, Report on the Loss of the Mars Polar Lander and Deep Space 2 Missions, NASA/Jet Propulsion Laboratory, JPL D-18709, California Institute of Technology, 2000a.

National Transportation Safety Board, Controlled Flight Into Terrain Korean Air Flight 801 Boeing 747-300, HL7468 Nimitz Hill, Guam August 6, 1997. Aircraft Accident Report NTSB/AAR-99/02, 2000.

J. Reason, Managing the Risks of Organizational Accidents, Ashgate, 1998.

J. Rushby, Using Model Checking to Help Discover Mode Confusions and Other Automation Surprises. In D. Javaux and V. de Keyser (eds.) Proc. of the 3rd Workshop on Human Error, Safety, and System Development, Liege, Belgium, 7--8 June 1999.

South-West Thames Regional Health Authority. Report of the Inquiry Into The London Ambulance Service Computer-Assisted Despatch System (February 1993) Original ISBN No: 0 905133 70 6

US Department of Health and Human Services, Food and Drug Administration, Guidance for the Content of Premarket Submissions for Software Contained in Medical Devices. Report Number 337, May 1998.