\input{/users/staff/johnson/teaching/hoskyns/slidedefs.tex} \title{Safety Critical Systems Development} \author{Prof. Chris Johnson,\\ Department of Computing Science,\\ University of Glasgow,\\ Glasgow,\\ Scotland.\\ G12 8QJ.\\ \\ URL: http://www.dcs.gla.ac.uk/$\sim$johnson\\ E-mail: johnson@dcs.glasgow.ac.uk\\ Telephone: +41 330 6053} \date{October 1999.} \begin{document} \maketitle \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Developmen t}.} \pagehead{Terminology and the Arian 5 Case Study} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Introduction} Safety Critical Systems Development Hazard Analysis \slideitem{ Hazard Analysis. \slideitem{ FMECA/FMEA. \slideitem{ Case Study. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Hazard Analysis \slideitem{ Safety case: - why proposed system is safe. \slideitem{ Must identify potential hazards. \slideitem{ Assess liklihood and severity. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Hazard Analysis \slideitem{Lots of variant features: - checklists... - hazard indices... \slideitem{ Lots of techniques: - fault tress (see later); - cause consequence analysis; - HAZOPS; - FMECA/FHA/FMEA... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA - Failure Modes, Effect and Criticality Analysis \slideitem{MIL STD 1629A (1977!). \slideitem{Analyse each potential failure. \slideitem{Determine impact of system(s). \slideitem{Assess its criticality. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA - Failure Modes, Effect and Criticality Analysis 1. Construct functional block diagram. 2. Use diagram to identify any associated failure modes. 3. Identify effects of failure and assess criticality. 4. Repeat 2 and 3 for potential consequences. 5. Identify causes and occurence rates. 6. Determine detection factors. 7. Calculate Risk Priority Numbers. 8. Finalise hazard assessment. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA - Step 1: Functional Block Diagram \slideitem{Establish scope of the analysis. \slideitem{Break system into subcomponents. \slideitem{Different levels of detail? \slideitem{Some unknowns early in design? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA - Step 1: Functional Block Diagram Acknowledgement: taken from J.D. Andrews and T.R. Moss, Reliability and Risk Assessment, Longman, Harlow, 1993 (ISBN-0-582-09615-4). \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA - Step 2: Identfy Failure Modes \slideitem{ Many different failure modes: - complete failure; - partial failure; - intermittant failure; - gradual failure; - etc. \slideitem{Not all will apply? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA - Step 3: Assess Criticality \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA - Step 4: Repeat for potential consequences \slideitem{ Can have knock-on effects. \slideitem{Additional failure modes. \slideitem{Or additional contexts of failure. \slideitem{Iterate on the analysis. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA - Step 5: Identify Cause and Occurence Rates \slideitem{Modes with most severe effects first. \slideitem{What causes the failure mode? \slideitem{How likely is that cause? \slideitem{risk = frequency x cost \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA - Step 5: Identify Cause and Occurence Rates
Hazardous without warning Very high severity ranking when a potential failure mode affects safe operation or involves non-compliance with a government regulation without warning.

10

Hazardous with warning Failure affects safe product operation or involves noncompliance with government regulation with warning.

9

Very High Product is inoperable with loss of primary Function.

8

High Product is operable, but at reduced level of performance.

7

Moderate Product is operable, but comfort or convenience item(s) are inoperable.

6

Low Product is operable, but comfort or convenience item(s) operate at a reduced level of performance.

5

Very Low Fit & finish or squeak & rattle item does not conform. Most customers notice defect.

4

Minor Fit & finish or squeak & rattle item does not conform. Average customers notice defect.

3

Very Minor Fit & finish or squeak & rattle item does not conform. Discriminating customers notice defect.

2

None No effect

1

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA - Step 6: Determine detection factors Type (1): These controls prevent the Cause or Failure Mode from occurring, or reduce their rate of occurrence. Type (2): These controls detect the Cause of the Failure Mode and lead to corrective action. Type (3): These Controls detect the Failure Mode before the product operation, subsequent operations, or the end user. \slideitem{Can we detect/control failure mode? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA - Step 6: Determine detection factors
1 in 2

10

1 in 3

9

High: Repeated failures 1 in 8

8

1 in 20

7

Moderate: Occasional failures 1 in 80

6

1 in 400

5

1 in 2000

4

Low: Relatively few failures 1 in 15,000

3

1 in 150,000

2

Remote: Failure is unlikely 1 in 1,500,000

1

Detection Criteria: Likelihood of Detection by Design Control

Rank

Absolute Uncertainty Design Control does not detect a potential Cause of failure or subsequent Failure Mode; or there is no Design Control

10

Very Remote Very remote chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode

9

Remote Remote chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode

8

Very Low Very low chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode

7

Low Low chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode

6

Moderate Moderate chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode

5

Moderately High Moderately high chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode

4

High High chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode

3

Very High Very high chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode

2

Almost Certain Design Control will almost certainly detect a potential Cause of failure or subsequent Failure Mode

1

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA - Step 7: Calculate Risk Priority Numbers \slideitem{Risk Priority Numbers (RPN) \slideitem{RPN = S x O x D, where - S - severity index; - O - occurence index; - D - detection index. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA: Step 8 - Finalise Hazard Analysis \slideitem{ Must document the analysis... \slideitem{ ...and response to analysis. \slideitem{Use FMECA forms. \slideitem{Several formats and tools. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA: Step 8 - Finalise Hazard Analysis \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA: Tools \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FMECA: Tools \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Conclusions \slideitem{ Hazard analysi. \slideitem{ FMECA/FMEA. \slideitem{ Qualitative->quantitative approaches. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Probabilistic Risk Assessment (PRA) The use of PRA technology should be increased in all regulatory matters to the extent supported by the state of the art in PRA methods and data and in a manner that complements the NRC's deterministic approach and supports the NRC's traditional defense-in-depth philosophy. PRA and associated analyses (e.g., sensitivity studies, uncertainty analyses, and importance measures) should be used in regulatory matters, where practical within the bounds of the state of the art, to reduce unnecessary conservatism associated with current regulatory requirements, regulatory guides, license commitments, and staff practices. An Approach for Plant-Apecific, Risk-Informed Decisionmaking: Technical Specifications \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Hazard Analysis vs PRA \slideitem{ FMECA - hazard analysis. \slideitem{ PRA part of hazard analysis. \slideitem{ Wider links to decision theory. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Decision Theory \slideitem{ Risk = frequency x cost. \slideitem{Which risk do we guard against? DecisionA = (option_1; option_2;...; option_n) DecisionB = (option_1; option_2;...;option_m) Val(Decision) = sum^{i = 1 to limit} utility(option_n) x freq(option_n) \slideitem{Are decision makers rational? \slideitem{Can you trust the numbers? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} PRA - Meta-Issues \slideitem{ Decision theory counter intuitive? \slideitem{But just a formalisation of FMECA? \slideitem{What is the scope of this approach? - hardware failure rates (here)? - human error rates (here)? - software failure rates? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} PRA Acknowledgement: J. D. Andrews and T.R. Moss, Reliability and Risk Assessment, Longman, New York, 1993. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} PRA \slideitem{ Failure rate assumed to be constant. \slideitem{Electronic systems approximate this. \slideitem{Mechanical systems: - bed-down failure rates; - degrade failure rates; \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} PRA - Mean Time To Failure \slideitem{ MTTF: reciprocal of constant failure rate. MTTF = 1 / lambda. lambda - base failure rate \slideitem{ 0.2 failures per hour: \slideitem{See Andrews and Moss for proof. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} PRA - Or Put Another Way... Probability that product will work for T without failure: R(T) = exp(-T/MTTF) \slideitem{ If MTTF = 250,000 hours. \slideitem{ Over life of 10 years (87,600 hours). \slideitem{ R = exp(-87,600/250000) = 0.70441 \slideitem{ 70.4% prob of no failure in 10 years. \slideitem{ 70.4% of systems working in 10 year. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} PRA \slideitem{ For each failure mode. Criticality_m = a x b x lambda_p x time lambda_p - base failure rate with environmental/stress data a - proportion of total failures in specified failure mode m b - conditional prob. that expected failure effect will result \slideitem{ If no failure data use: \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} PRA - Sources of Data \slideitem{ MIL-HDBK-217: Reliability Prediction of Electronic Equipment \slideitem{Failure rate models for: - ICs, transistors, diodes, resistors, - relays, switches, connectors etc. \slideitem{ Field data + simplifying assumptions. \slideitem{Latest version F being revised. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} PRA \slideitem{ 217 too pessimistic for companies... \slideitem{Bellcore (Telcordia): - reliability prediction procedure.. During 1997, AT&T's Defects-Per-Million performance was 173, which means that of every one million calls placed on the to a network failure. That equals a network reliability rate of 99.98 percent for 1997. \slideitem{ Business critical not safety critical \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} PRA \slideitem{But MTTF doesnt consider repair! \slideitem{MTTR considers observations. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} PRA and FMECA Mode Probability \slideitem{FMECA: - we used subjective criticality; - however, MIL-338B calculates it; - no. of failures per hour per mode. \slideitem{CR = alpha x beta x lamda: CR - criticality level, alpha - failure mode frequency ratio, beta - loss prob. of item from mode lambda - base failure rate for item. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} PRA and FMECA Mode Probability \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} PRA \slideitem{We focussed on hardware devices. \slideitem{PRA for human reliability? \slideitem{Probably not a good idea. \slideitem{But for completeness... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Technique for Human Error Rate Prediction (THERP) ``The THERP approach uses conventional reliability technology modified to account for greater variability and independence of human performance as compared with that of equipment performance... The procedures of THERP are similar to those employed in conventional reliability analysis, except that human task activities are substituted for equipment outputs.'' (Miller and Swain, 1987 - cited by Hollnagel, 1998). A.D. Swain and H.E. Guttman, Handbook of Human Reliability with Emphasis on Nuclear Power Plant Applications NUREG-CR-1278, 1985. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Technique for Human Error Rate Prediction (THERP) \slideitem{Pe = He * Sum^{k=1 to n} Psf_k * W_k + C \slideitem{ Where: Pe - probability of error; He - raw human error probability; C - numerical constant; Psf_k - performance shaping factor; W_k - weight associated with PSF_k; n - total number of PSFs. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Technique for Human Error Rate Prediction (THERP) \slideitem{"Psychological vaccuous" (Hollnagel). \slideitem{No model of cognition etc. \slideitem{Calculate effect of PSF on HEP - ignores WHY they affect performance. \slideitem{Succeeds or fails on PSFs. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} THERP - External PSFs Acknowledgement: A.D. Swain, Comparative Evaluation of Methods for Human Reliability Analysis, (GRS-71), Garching FRG: Gesellschaft fur Reaktorsicherheit. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} THERP - Stressor PSFs Acknowledgement: A.D. Swain, Comparative Evaluation of Methods for Human Reliability Analysis, (GRS-71), Garching FRG: Gesellschaft fur Reaktorsicherheit. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} THERP - Internal PSFs Hint: use yor browser to open this image. Acknowledgement: A.D. Swain, Comparative Evaluation of Methods for Human Reliability Analysis, (GRS-71), Garching FRG: Gesellschaft fur Reaktorsicherheit. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} CREAM E. Hollnagel, Cognitive Reliability and Error Analysis Method, Elsevier, Holland, 1998. \slideitem{HRA + theoretical basis. \slideitem{Simple model of control: - scrambled - unpredictable actions; - opportunistic - react dont plan; - tactical - procedures and rules; - strategic - consider full context. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} CREAM - Simple Model of Control Hint: use yor browser to open this image. Acknowledgement: E. Hollnagel, Cognitive Reliability and Error Analysis Method, Elsevier, Holland, 1998. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} CREAM - Simple Model of Control Acknowledgement: E. Hollnagel, Cognitive Reliability and Error Analysis Method, Elsevier, Holland, 1998. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} CREAM \slideitem{Much more to the technique... \slideitem{But in the end: Strategic = 0.000005 < p < 0.01 Tactic = 0.001< p < 0.1 Opportunistic = 0.01 < p < 0.5 Scrambled = 0.1 < p < 1.0 \slideitem{Common performance conditions to - probable control mode then to - reliability estimate from literature. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Conclusions \slideitem{PRA for hardware: - widely accepted with good data; \slideitem{PRA for human performance: - many are skeptical; - THERP -> CREAM -> \slideitem{PRA for software? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} PRA and Fault Tree Analysis \slideitem{Fault Trees (recap) \slideitem{Software Fault Trees. \slideitem{Software PRA. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Trees (Recap) \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Tree Analysis \slideitem{Each tree considers 1 failure. \slideitem{Carefully choose top event. \slideitem{Carefully choose system boundaries. \slideitem{Assign probabilities to basic events. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Tree Analysis \slideitem{Assign probabilities to basic events. \slideitem{Stop if you have the data. \slideitem{Circles denote basic events. \slideitem{Even so, tool support is critical. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Tree Analysis \slideitem{Usually applied to hardware... \slideitem{Can be used for software (later). \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Tree Analysis \slideitem{House events; "switch" true or false. \slideitem{OR gates - multiple fault paths. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Tree Analysis \slideitem{Probabilistic inhibit gates. \slideitem{Used with Monte Carlo techniques - True if random number < prob. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Tree Analysis \slideitem{Usually applied to hardware... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Tree Analysis Acknowledgement: J.D. Andrews and T.R. Moss, Reliability and Risk Assessment, Longman Scientific and Technical, Harlow, 1993. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Tree Analysis - Cut Sets \slideitem{Each failure has several modes - `different routes to top event'. \slideitem{Cut set: basic events that lead to top event. \slideitem{Minimal cut set: removing a basic event avoids failure. \slideitem{Path set: basic events that avoid top event; list of components that ensure safety. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Tree Analysis - Cut Sets \slideitem{Top_Event = K1 + K2 + ... K_n K_i minimal cut sets, + is logical OR. \slideitem{K_i = X_1 . X_2 . X_n MCS are conjuncts of basic events. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Tree Analysis - Cut Sets \slideitem{Top-down approach: - replace event by expression below; - simply if possible (C.C = C). \slideitem{ Can use Karnaugh map techniques; - cf logic circuit design; - recruit tool support in practice. \slideitem{Notice there is no negation. \slideitem{Notice there is no XOR. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Tree Analysis - MOCUS Cut Set Algorithm 1. Assign unique label to each gate. 2. Label each basic event. 3. Create a two dimensional array A. 4. Initialise A(1,1) to top event. 5. Scan array to find an OR/AND gate:
If current position in A is OR gate... - replace current position with a column; - put gate's input events in new row of that column. - replace current position with a row; - put gate's input events in new column of that row. 6. Repeat 5 until no gates remain in array. 7. Remove any non-minimal cut sets. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Tree Analysis - MOCUS \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Tree Analysis - Probabilistic Analysis \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Tree Analysis - Probabilistic Analysis \slideitem{Beware: independence assumption. "If the same event occurs multiple times/places in a tree, any quantitative calculation must correctly reduce the boolean equation to account for these multiple occurrences. Independence merely means that the event is not caused due to the failure of another event or component, which then moves into the realm of conditional probabilities." \slideitem{Inclusion-exclusion expansion (Andrews & Moss). \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Fault Trees \slideitem{As you'd expect. \slideitem{Starts with top-level failure \slideitem{Trace events leading to failure. \slideitem{But: dont use probabilistic assessments; \slideitem{If you find software fault path REMOVE IT! \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Fault Trees Leveson, N.G., Cha, S.S., Shimeall, T.J. ``Safety Verification of Ada Programs using Software Fault Trees,'' IEEE Software, July 1991. \slideitem{Backwards reasoning. \slideitem{Weakest pre-condition approach. \slideitem{Similar to theorem proving. \slideitem{Uses language dependent templates. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Fault Trees \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Fault Trees \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Fault Trees \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Fault Trees \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Fault Trees \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Fault Trees \slideitem{Exception template for Ada83. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Fault Trees See: S.-Y. Min, Y-K. Jang, A-D Cha, Y-R Kwon and D.-H. Bae, Safety Verification of Ada95 Programs Using Software Fault Trees. In M. Felici, K. Kanoun and A. Pasquini (eds.) Computer Safety, Reliability and Security, Springer Verlag, LNCS 1698, 1999. \slideitem{Exception template for Ada95. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} PRA for Software \slideitem{John Musa's work at Bell Labs. \slideitem{Failure rate of software before tests. \slideitem{Faults per unot of time (lambda_0): - function of faults over infinite time. \slideitem{ Based on execution time: - not calendar time as in hardware; - so no overall system predictions. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Musa's PRA for Software lambda_0 = K x P x W_0 Symbol Represents Sample value k Constant that accounts for the dynamic structure of the program and the varying machines k = 4.2E-7 p Estimate of the number of executions per time unit p = r/SLOC/ER r Average instruction execution rate, determined from the manufacturer or benchmarking Constant SLOC Source lines of code (not including reused code). . ER Expansion ratio, a constant dependent upon programming language Assembler, 1.0; Macro Assembler, 1.5; C, 2.5; COBAL, FORTRAN, 3; Ada, 4.5 W_0 Estimate of the initial number of faults in the program Can be calculated using: w0 = N x B or a default of 6 faults/1000 SLOC can be assumed N Total number of inherent faults Estimated based upon judgment or past experience B Fault to failure conversion rate; proportion of faults that become failures. Proportion of faults not corrected before the product is delivered. Assume B = .95; i.e., 95% of the faults undetected at delivery become failures after delivery \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} PRA for Software \slideitem{Considerable debate about this. \slideitem{Many variants on the theme. \slideitem{Metrics are crude... \slideitem{In meantime, be skeptical \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} FTA - Conclusions \slideitem{Fault Trees: - cut sets, cut paths; - quantitative analysis. \slideitem{Software Fault Trees: - language dependent templates; - if you see faults, remove them! \slideitem{Software PRA. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Safety-Critical Software \slideitem{Why is software different? \slideitem{ Software requirements: - Leveson's completeness criteria. \slideitem{ Software design (summary): MIL-338B preliminary design; MIL-338B detailed design. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Why is Software Different? \slideitem{ Software is an abstract concept in that it is a set of instructions on a piece of paper or in computer memory. It can be torn apart and analyzed in piece parts like hardware, yet unlike hardware it is not a physical entity with physical characteristics which must comply with the laws of nature (i.e., physics and chemistry). \slideitem{ Since software is not a physical entity it does not wear out or degrade over time. This means that software does not have any failure modes per se. Once developed it always works the same without variation \slideitem{ Unlike hardware, once a software program is developed it can be duplicated or manufactured into many copies without any manufacturing variations. \slideitem{ Software is much easier to change than is hardware. For this reason many system fixes are made by modifying the software rather than the hardware. \slideitem{ There are no standard parts in software as there are with hardware. Therefore there are no high reliability software modules, and no industry alerts on poor quality software items. \slideitem{ If software has anything which even resembles a failure mode, it is in the area of hardware induced failures. \slideitem{ Hardware reliability prediction is based upon random failures, whereas software reliability prediction is based upon the theory that predestined errors exist in the software program. \slideitem{ Hardware reliability modeling is well established, however, there is no uniform, accurate or practical approach to predicting and measuring software reliability. \slideitem{ Since software does not have any failure modes, a software problem is referred to as a software error. A software error is defined as a situation when the software does not perform to specifications or as reasonably expected, that is when it performs unintended functions. This definition is fairly consistent with that of a hardware failure, except that the mechanisms or causes of failure are very different. \slideitem{ Hardware primarily fails due to physical or chemical mechanisms and seldom fails due to human failure mechanisms (e.g., documentation errors, coding errors, specification oversights), whereas just the opposite is true with software. \slideitem{ Software has many more failure paths than hardware, making it difficult to test all paths. \slideitem{ By itself software can do nothing and is not hazardous. Software must be combined with hardware in order to do anything. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Defects (Initial Views) A software defect is either a fault or discrepancy between code and documentation that compromises testing or produces adverse effects in installation, modification, maintenance, or testing. \slideitem{ Requirements Defects: Failure of software requirements to specify the environment in which the software will be used, or requirements documentation that does not reflect the design of the system in which the software will be employed. \slideitem{ Design Defects: Failure of designs to satisfy requirements, or failure of design documentation to correctly describe the design. \slideitem{ Code Defects: Failure of code to conform to software designs. Robert Dunn, Software Defect Removal, McGraw-Hill, 1984. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Defects (Initial Views) A software fault that causes a deviation from the required output by more than a specified tolerance. Moreover, the software need produce correct outputs only for inputs within the limits that have been specified. It needs to produce correct outputs only within a specified exposure period. Since these definitions differ, a count of the number of defects will yield different results, and, hence, a different defect rate, depending on the counters definition. \slideitem{ Requirements Defects \slideitem{ Design Defects \slideitem{ Algorithmic Defects \slideitem{ Interface Defects \slideitem{ Performance Defects \slideitem{ Documentation Defects Lawrence Putnam and Ware Myers, Measures for Excellence: Reliable Software on Time, Within Budget, Prentice-Hill, 1992. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Hazard Analysis \slideitem{Already seen software fault trees. \slideitem{ Trace identified software hazards to the software-hardware interface. Translate the identified software related hazards into requirements and constraints on software behaviour. \slideitem{ Show the consistency of the software safety constraints with the software requirements specification. Demonstrate the completeness of the software requirements, including the human-computer interface requirements, with respect to system safety properties. Acknowledgement: Nancy Leveson, Safeware: System Safety and Computers, Addison Wesley, Reading Massachusetts, 1995. \slideitem{Point 2 links to safety case slides? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Requirements Analysis \slideitem{Leveson identifies 3 components. \slideitem{Basic function or objective. \slideitem{Constraints on operating conditions. \slideitem{Prioritised quality goals; - to help make tradeoff decisions. \slideitem{Same as general hazard analysis? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Kernel Requirements and Intent Specifications \slideitem{ Kernel or core set of requirements. \slideitem{Determined by current knowledge of: - intended application functionality; - environment & constraints. \slideitem{Analytically independent. \slideitem{Only know they are complete if - we know specification intent... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Leveson's Completeness Criteria \slideitem{Remember - `Black Box' architecture. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Leveson's Completeness Criteria \slideitem{Human Computer Interface Criteria. \slideitem{State Completeness. \slideitem{Input/Output Variable Completeness. \slideitem{Trigger Event Completeness. \slideitem{Output Specification Completeness. \slideitem{Output to Trigger Relationships. \slideitem{State Transitions. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Leveson's Completeness Criteria \slideitem{Human Computer Interface Criteria. \slideitem{Criteria depend on task context. \slideitem{Eg in monitoring situation: - what must be observed/displayed? - how often is it sampled/updated? - what is message priority? \slideitem{Not just when to present but also - when to remove information... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Leveson's Completeness Criteria \slideitem{State Completeness Criteria. \slideitem{Consider input effect when state is: - normal, abnormal, indeterminate. \slideitem{Start-up, close-down are concerns. \slideitem{Process will change even during - intervals in which software is `idle'. \slideitem{Checkpoints, timeouts etc. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Leveson's Completeness Criteria \slideitem{Input/Output Variable Completeness. \slideitem{Input from sensors to software. \slideitem{Output from software to actuators. \slideitem{Specification may be incomplete if: - sensor isnt refered to in spec; - legal value isnt used in spec. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Leveson's Completeness Criteria \slideitem{Trigger Event Completeness. Robustness: every state has a transition defined for every possible input. Non-determinism: only 1 transition is possible from a state for each input. Value and Timing assumptions: - what triggers can be produced from the environment? - what ranges must trigger variables fall within? - what are the real-time requirements... - specify bounds for responses to input (timeouts) \slideitem{And much, much more.... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Leveson's Completeness Criteria \slideitem{Output Specification Completeness. - from software to process actuators. \slideitem{Check for hazardous values. \slideitem{Check for hazardous timings; - how fast do actuators take events? - what if this rate is exceeded? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Leveson's Completeness Criteria \slideitem{Output to Trigger Relationships. \slideitem{Links between input & output events. \slideitem{For any output to actuators: - can effect on process be detected? - if output fails can this be seen? \slideitem{ What if response is: - missing, too early or too late? \slideitem{If response recieved without trigger - then erroneous state. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Leveson's Completeness Criteria \slideitem{State Transitions. Reachability: all specified states can be reached from initial state. Recurrent behaviour: desired recurrent behaviour must execute for at least one cycle and be bounded by exit condition. Reversibility: output commands should wherever possible be reversible and those which are not must be carefully controlled.
Preemption: all possible preemption events must be considered for any non-atomic transactions. \slideitem{Again more complexity here... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Reality Check... \slideitem{Completeness criteria change. \slideitem{Environment and functions change. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} From Requirements to Design Once the requirements have been detailed and accepted, the design will process of allocating and arranging the functions of the system so that the aggregate meets all customer needs. Since several different designs may meet the requirements, alternatives must be assessed based on technical risks, costs, schedule, and other considerations. A design developed before there is a clear and concise analysis of the systems objectives can result in a product that does not satisfy the requirements of its customers and users. In addition, an inferior design can make it very difficult for those who must later code, test, or maintain the software. During the course of a software development effort, analysts may offer and explore many possible design alternatives before choosing the best design. US Department of Defence: Electronic Reliability Design Handbook \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Preliminary Design Preliminary or high-level design is the phase of a software project in which the major software system alternatives, functions, and requirements are analyzed. From the alternatives, the software system architecture is chosen and all primary functions of the system are allocated to the computer hardware, to the software, or to the portions of the system that will continue to be accomplished manually. US Department of Defence: Electronic Reliability Design Handbook \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Preliminary Design \slideitem{Develop the architecture: \slideitem{ system architecture - an overall view of system components \slideitem{ hardware architecture - the systems hardware components and their interrelations \slideitem{ software architecture - the systems software components and their interrelations \slideitem{ Investigate and analyze the physical alternatives for the system and choose solutions \slideitem{ Define the external characteristics of the system \slideitem{ Refine the internal structure of the system by decomposing the high-level software architecture \slideitem{ Develop a logical view or model of the systems data US Department of Defence: Electronic Reliability Design Handbook \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Detailed Design Detailed design or low-level design determines the specific steps required for each component or process of a software system. Responsibility for detailed design may belong to either the system designers (as a continuation of preliminary design activities) or to the system programmers. Information needed to begin detailed design includes: the software system requirements, the system models, the data models, and previously determined functional decompositions. The specific design details developed during the detailed design period are categories: for the system as a whole (system specifics), for individual processes within the system (process specifics), and for the data within the system (data specifics). US Department of Defence: Electronic Reliability Design Handbook \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Detailed Design (Example concerns) System specifics: \slideitem{ Physical file system structure \slideitem{ Interconnection records or protocols between software and hardware components \slideitem{ Packaging of units as functions, modules or subroutines \slideitem{ Interconnections among software functions and processes \slideitem{ Control processing \slideitem{ Memory addressing and allocation \slideitem{ Structure of compilation units and load modules Process specifics: \slideitem{ Required algorithmic details \slideitem{ Procedural process logic \slideitem{ Function and subroutine calls \slideitem{ Error and exception handling logic Data specifics: \slideitem{ Global data handling and access \slideitem{ Physical database structure \slideitem{ Internal record layouts \slideitem{ Data translation tables \slideitem{ Data edit rules \slideitem{ Data storage needs US Department of Defence: Electronic Reliability Design Handbook \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Don't Forget the Impact of Standards UK Defense software standard Sean Matthews Fri, 30 Jun 89 13:49:12 BST I have just seen a copy of the UK department of defence draft standard for safety critical software (00-55). Here are a few high (and low) points. 1. There should be no dynamic memory allocation (This rules out explicit recursion - though a bounded stack is allowed). 2. There should be no interupts except for a regular clock interupt. 3. There should not be any distributed processing (i.e. only a single processor). 4. There should not be any multiprocessing. 5. NO ASSEMBLER. 6. All code should be at least rigourously checked using mathematical methods. 7. Any formally verified code should have the proof submitted as well, in machine readable form, so that an independent check can be performed. 8. All code will be formally specified. 9. There are very strict requirements for static analysis (no unreachable code, no unused variables, no unintialised variables etc.). 10. No optimising compilers will be used. 11. A language with a formally defined syntax and a well defined semantics, or a suitable subset thereof will be used. Comments. 1. means that all storage can be statically allocated. In fact somewhere it says that this should be the case. 2-4 seem to leave no option but polling. This is impractical, especially in embedded systems. No one is going to build a fly by wire system with those sorts of restrictions. (maybe people should therefore not build fly by wire systems, but that is another matter that has been discussed at length here already). it also ignores the fact that there are proof methods for dealing with distributed systems. 5. This is interesting, I seem to remember reading somewhere that Nasa used to have the opposite rule: no high level languages, since they actually read the delivered binary to check that the software did what it was supposed to do. methods' is *invoked* in a general way without going into very much detail about what is involved. I am not sure that the people who wrote the report were sure (Could someone from Praxis - which I believe consulted on drawing it up - enlarge on this?). 8. this is an excellent thing, though it does not say what sort of language should be used. Is a description in terms of a Turing machine suitable? After all that is a well understood formal system. 10. Interestingly, there is no requirement that the compiler be formally verified, just that it should conform to international standards (though strictly), and not have any gross hacks (i.e. optimisation) installed. There is also no demand that the target processor hardware be verified (though such a device exists here already: the Royal Signals Research Establishment's Viper processor). 11. seems to be a dig at Ada and the no subsets rule. It also rules out C. Conclusions. I find the idea of the wholesale mayhem and killing merchants being forced to try so much harder to ensure that their products maim and kill only the people they are supposed to maim and kill, rather amusing. The standard seems to be naive in its expectations of what can be achieved at the moment with formal methods (That is apparently the general opinion around here, and there is a *lot* of active research in program verification in Edinburgh), and impossibly restrictive. An interesting move in the right direction but too fast and too soon. And they might blow the idea of Formal verification by tring to force it too soon. And I would very much like to see these ideas trickle down into the civil sector. I might follow this up with a larger (and more coherent) description if there is interest (this was typed from memory after seeing it yesterday) there is quite a bit more in it. Sean Matthews Dept. of Artificial Intelligence JANET: sean@uk.ac.ed.aipna University of Edinburgh ARPA: sean%uk.ac.ed.aipna@nsfnet-relay.ac.uk 80 South Bridge UUCP: ...!mcvax!ukc!aipna!sean Edinburgh, EH1 1HN, Scotland \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Conclusion \slideitem{Why is software different? \slideitem{ Software requirements: - Leveson's completeness criteria. \slideitem{ Software design (summary): MIL-338B preliminary design; MIL-338B detailed design. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Safety-Critical Software Development \slideitem{Software design by: - hazard elimination; - hazard reduction; - hazard control. \slideitem{Software implementation issues: - dangerous practices; - choice of `safe' languages. \slideitem{The DO-178B Case Study. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Leveson's Taxonomy of Design Techniques \slideitem{Hazard elimination/avoidance \slideitem{Hazard reduction (see 4?) \slideitem{Hazard control \slideitem{Hazard minimization (see 2?) \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Design and Hazard Elimination \slideitem{Substitution hardware interlocks before software. \slideitem{Simplification new software features add complexity. \slideitem{Decoupling computers add common failure point. \slideitem{Human Error `Removal' readability of instruments etc. \slideitem{Removal of hazardous materials eliminate UNUSED code (Ariane 5). \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Hazard Elimination: Datalink Example \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} DO-178B - NASA GCS Case Study \slideitem{Project compared: - faults found in statistical tests; - faults found in 178B development. \slideitem{Main conclusions: - such comparisons very difficult; - DO-178B hard to implement; - lack of materials/examples. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Development: DO-178B Practitioners' View The difficulties that have been identified are the DO-178 requirements for evidence and rigorous verification... Systematic records of accomplishing each of the objectives and guidance are necessary. A documentation trail must exist demonstrating that the development processes not only were carried out, but also were corrected and updated as necessary during the program life cycle. Each document, review, analysis, and test must have evidence of critique for accuracy and completeness, with criteria that establishes consistency and expected results. This is usually accomplished by a checklist which is archived as part of the program certification records. The degree of this evidence varies only by the safety criticality of the system and its software. Original source on http://stsc.hill.af.mil/crosstalk/1998/oct/schad.asp \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Development: DO-178B Practitioners' View ...Engineering has not been schooled or trained to meticulously keep proof of the processes, product, and verification real-time. The engineers have focused on the development of the product, not the delivery. In addition, program durations can be from 10 to 15 years resulting in the software engineers moving on by the time of system delivery. This means that most management and engineers have never been on a project from "cradle-to-grave." Original source on http://stsc.hill.af.mil/crosstalk/1998/oct/schad.asp \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Development: DO-178B Practitioners' Views The weakness of commercial practice with DO-178B is the lack of consistent, comprehensive training of the FAA engineers/DERs/foreign agencies affecti ng: \slideitem{ the effectiveness of the individual(s) making findings; and, \slideitem{ the consistency of the interpretations in the findings. Training programs may be the answer for both the military and commercial environments to avoid the problem of inconsistent interpretation and the results of literal interpretation. Original source on http://stsc.hill.af.mil/crosstalk/1998/oct/schad.asp \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Safety-Critical Software Development - Conclusions \slideitem{Software design by: - hazard elimination; - hazard reduction; - hazard control. \slideitem{Software implementation issues: - dangerous practices; - choice of `safe' languages. \slideitem{The DO-178B Case Study. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Design and Hazard Reduction \slideitem{Design for control: - incremental control; - intermediate states; - decision aids; - monitoring. \slideitem{Add barriers: - hard/software locks; \slideitem{Minimise single point failures: - increase safety margins; - exploit redundancy; - allow for recovery. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Hazard Reduction: Interlock Example This heavy duty solenoid controlled tongue switch controls access to hazardous machines with rundown times. Olympus withstands the arduous environments associated with the frequent operation of heavy duty access guards. The unit also self adjusts to tolerate a high degree of guard misalignment. The stainless steel tongue actuator is self-locking and can only be released after the solenoid receives a signal from the machine control circuit. This ensures that the machine has completed it's cycle and come to rest before the tongue can be disengaged and machine access obtained. Software Design and Hazard Control \slideitem{Limit exposure. back to `normal' fast (exceptions). \slideitem{Isolate and contain. dont let things get worse... \slideitem{Fail-safe. panic shut-downs, watchdog code. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Hazard Control: Watchdog Example \slideitem{Hardware or software (beware). \slideitem{Check for processor activity: - 1. load value into a timer; - 2. decrement timer every interval; - 3. if value is zero then reboot. \slideitem{Processor performs 1 at a frequency - great enough to stop 3 being true; - unless it has crashed. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Design Techniques: Fault Tolerance \slideitem{Avoid common mode failures. \slideitem{Need for design diversity. \slideitem{Same requirements: - different programmers? - different contractors? - homogenous parallel redundancy? - microcomputer vs PLC solutions? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Design Techniques: Fault Tolerance \slideitem{Redundant hardware may duplicate - any faults if software is the same. \slideitem{N-version programming: - shared requirements; - different implementations; - voting ensures agreement. \slideitem{ What about timing differences? - comparison of "continuous" values? - what if requirements wrong? - costs make N>2 very uncommon; - performance costs of voting. \slideitem{A340 primary flight controls. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Design Techniques: Fault Tolerance \slideitem{Exception handling mechanisms. \slideitem{Use run-time system to detect faults: - raise an exception; - pass control to appropriate handler; - could be on another processor. \slideitem{Propagate to outmost scope then fail. \slideitem{Ada... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Design Techniques: Fault Tolerance \slideitem{Recovery blocks: - write acceptance tests for modules; - if it fails then execute alternative. \slideitem{Must be able to restore the state: - take a snapshot/checkpoint; - if failure restore snapshot. \slideitem{But: - if failed module have side-effects? - eg effects on equip under control? - recovery block will be complicated. \slideitem{Different from execptions: - dont rely on run-time system. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Design Techniques: Fault Tolerance \slideitem{Control redundancy includes: - N-version programming; - recovery blocks; - exception handling. \slideitem{But data redundancy uses extra data - to check the validity of results. \slideitem{Error correcting/detecting codes. \slideitem{Checksum agreements etc. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Implementation Issues \slideitem{Restrict language subsets. \slideitem{Alsys CSMART Ada kernel etc. \slideitem{Or just avoid high level languages? \slideitem{No task scheduler - bare machine. \slideitem{Less scheduling/protection risks - more maintenance risks; - less isolation (no modularity?). \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Implementation Issues \slideitem{Memory jumps: - control jumps to arbitrary location? \slideitem{Overwrites: - arbitrary address written to? \slideitem{Semantics: - established on target processor? \slideitem{Precision: - integer, floating point, operations... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Implementation Issues \slideitem{Data typing issues:
- strong typing prevents misuse? \slideitem{Exception handling: - runtime recovery supported? \slideitem{Memory monitoring: -guard against memory depletion? \slideitem{Separate compilation: - type checking across modules etc? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Implementation Issues: Language Wars Acknowledgement: W.J. Cullyer, S.J. Goodenough, B.A. Wichmann, The choice of a Computer Language for Use in Safety-Critical Systems, Software Engineering Journal, (6)2:51-58, 1991. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Implementation Issues: Language Wars \slideitem{CORAL subset:
staff training issues? \slideitem{SPADE Pascal: Praxis version of ISO Pascal. \slideitem{Modula 2 subset: SACEM trains in Paris. \slideitem{Ada subset: attempts at formal verification. \slideitem{Meta question: programmer more important than language? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Implementation Issues: Language Wars \slideitem{Lots of new proposals: - some more pragmatic than others... \slideitem{Continuing attempts at standards: \slideitem{An ENGINEERING approach: \slideitem{Meta question 2: language depends on application! \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Implementation Issues: Language Wars Safety-Critical Systems Computer Language Survey - Results Newsgroups: comp.lang.ada,comp.lang.c++,comp.lang.misc,comp.software-eng From: cpp@netcom.com (Robin Rowe) Subject: Safety-Critical Survey (Results) Message-ID: Organization: NETCOM On-line Communication Services (408 261-4700 guest) Date: Sun, 13 Nov 1994 22:34:10 GMT Lines: 180 ======================================================== Here are the results of my recent informal survey of computer languages used in safety-critical embedded systems and other interesting systems. In responses, Ada was by far the most popular language for these systems followed by assembler. There is a list describing 722 Ada projects that is available via ftp from the Ada Information Clearinghouse. The current version is 213K in size (contact adainfo@ajpo.sei.cmu.edu). I did not attempt to integrate that data into this report. No assertion is intended here that any language is necessarily superior to any other. Aerospace: --------- Allied Signal: ? Boeing: Mostly Ada with assembler. Also: Fortran, Jovial, C, C++. Onboard fire extinguishers in PLM. 777 seatback entertainment system in C++ with MFC (in development by Microsoft). 757/767: approximately 144 languages used. 747-400: approximately 75 languages used. 777: approximately 35 languages used. Boeing Defense & Space Group: (777 cabin mgmt. system in Ada?) DAINA/Air Force: Aircraft mission manager in Ada. Chandler Evans: Engine Control System in Ada (386 DOS). Draper Labs/Army/NASA: Fault tolerant architecture in Ada/VHDL. DuPont: ? European Space Agency: mandates Ada for mission critical systems. ISO (Infrared Space Observatory) SOHO (Solar and Heliospheric Observatory) Huygens/Cassini (a joint ESA/NASA mission to Saturn) Companies involved: British Aerospace (Space Systems) - Bristol, UK Fokker Space Systems - Amsterdam, Holland Matra-Marconi Espace - Toulouse, France Saab - Sweden Logica - UK DASA - Germany MBB - Germany Ford Aerospace: Spacecraft in Ada with assembler. GEOS and INSAT spacecraft in FORTRAN. (Ford Aerospace is now Space Systems/Loral.) Hamilton-Standard: (777 air cowling icing protection system in Ada?). Honeywell: Aircraft navigation data loader in C. (777 airplane information mgmt. system in Ada?) Intermetrics/Houston: space shuttle cockpit real-time executive in Ada '83 with 80386 assembly Lockheed Fort Worth: F-22 Advanced Tactical Fighter program in Ada 83 (planning to move to Ada 94) with a very small amount in MIL-STD-1750A assembly. Maintain older safety-critical systems for the F-111 and F-16/F-16 variant airframes primarily done in JOVIAL. NASA: Space station in Ada. (Sources differed on whether it was Ada only, or Ada with some C and assembler.) NASA Lewis: March 1994 space shuttle experiment in C++ on 386. Rockwell Space Systems Div.: Space shuttle in Hal/s and Ada. Defense Initiative in Ada. Other systems in Ada and C. Space Systems/Loral: Spacecraft in Ada with assembler. Teledyne: Aircraft flight data recorder in C. TRW/Air Force: Realtime avionics OS in Ada. Wilcox Electric: Navigation aids in C prior to 1990, Ada after. VOR-DME in Ada. Microwave landing system in Ada. Wide Area GPS in C and C++. Air Traffic Control: ------------------- Hughes: Canadian ATC system in Ada. Loral FSD: U.S. ATC system in Ada. Thomson-CSF SDC: French ATC system in Ada. Land Vehicles: ------------- Bosch: Diesel engine controls in C. (Other systems generally in C?) Delco: Engine controls and ABS in 68C series (Motorola) assembler. C++ used for data acquisition in GM research center. '93+ GM trucks vehicle controllers mostly in Modula-GM (Modula-GM is a variant of Modula-2. A typical 32-bit integrated vehicle controller may control the engine, the transmission, the ABS system, the Heating/AC system, as well as the associated integrated diagnostics and off-board communications systems.) Ford: Assembler. General Dynamic Land Systems: M1A2 tank tank software in Ada with time-critical routines in 68xxx assembler. Tank software simulators in C. Honda: ? Lucas: Many systems in Lucol (Lucas control language). Diesel engine controls in C++. ABS in 68xxx assembler. SAE: ? (Despite considerable effort on my part, I was unable to gather any information on languages or language standards from the Society of Automotive Engineers.) Ships: ----- Vosper Thornycroft Ltd (UK): navigation control in Ada. Trains: ------ AMTRAK: ? BART: ? (One rumor said Ada migrating to C. Can anyone confirm?) CSEE Transports (France): TGV Braking system in Ada (68K). Denver Airport baggage system: This well publicized problem system is written in C++. (A source familiar with the system said the problems were political and managerial, not directly related to C++.) European Rail: Switching system in Ada. EuroTunnel: in Ada. Extension to the London Underground: in Ada. GEC Alsthom (France): Railway and signal control systems for trains and the TGV (north lines and Chunnel) in Ada. Subway network control systems (Paris, Calcutta, and Cairo). TGV France: Switching system in Ada. Union Switch & Signal, Pittsburgh: (Switching system in ?) Westinghouse Signals Ltd (UK): Railway signalling systems in Ada. Westinghouse Brake & Signal UK: Automatic Train Protection (ATP) systems for Westrace project in PASCAL. Westinghouse Australia: ATP systems in PASCAL and ADA. Medical: ------- Baxter: Left Ventricular Heart Assist in C with 6811 assembler. Coulter Corp.: ONYX hematology analyzer in Ada. Nuclear Reactors: ---------------- Core and shutdown systems in assembler, migrating to Ada. SURVEY METHODOLOGY ================== I operated under the theory that, with regard to what languages are really in use, the recollections of the engineers themselves are probably the most accurate and open source. In general, I did not have enough sources that I could cross check the information. In cases where I could, the most interesting discrepancy was that companies that thought they had adopted one language as the total solution for all their software designs often had something in assembler or some other language somewhere. Every response to the survey was positive except one. An individual at Rockwell Collins said: "The language(s) we do/don't use is a matter best left to us, our customers, and the appropriate regulatory agencies governing our businesses and markets. All of these parties also look out for the public's interests in safety, cost, etc. as well." This individual took me to task for not contacting the PR department of his company, but was unwilling to help me do so. Per his request, I have omitted his company. If you wish to add information or make a correction please send mail to cpp@netcom.com. I'd like to fill in the companies that have question marks by them. I'm particularly interested in systems written in C++. Names of respondents are held confidential. If you respond with a public follow-up on the net, please cc via e-mail to me so that I don't miss you. Thanks to everyone who helped with this. I meant to post this in August, but got busy with work and relocating to Monterey and forgot. Sorry for the delay. Robin embedded.svy rev 11-13-94 -- ----- Robin Rowe cpp@netcom.com 408-375-9449 Monterey, CA Rowe Technology C++ training, consulting, and users groups. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Development: DO-178B \slideitem{Software Considerations in Airborne Systems and Equipment Certification. \slideitem{Widely used, cf. IEC-61508. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Development: DO-178B \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Development: DO-178B Life Cycle \slideitem{ Planning Process: - coordinates development activities. \slideitem{Software Development Processes: - requirements process - design process - coding process - integration process \slideitem{ Software Integral Processes: - verification process - configuration management - quality assurance - certification liaison \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Development: DO-178B Requirements for Design Descriptions (a) A detailed description of how the software satisfies the specified software high-level requirements, including algorithms, data-structures and how software requirements are allocated to processors and tasks. (b) The description of the software architecture defining the software structure to implement the requirements. (c) ??????????? (d) The data flow and control flow of the design. (e) Resource limitations, the strategy for managing each resource and its limitations, the margins and the method for measuring those margins, for example timing and memory. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Development: DO-178B Requirements for Design Descriptions (f) Scheduling procedures and interprocessor/intertask communication mechanisms, including time-rigid sequencing, pre-emptive scheduling, Ada rendez-vous and interrupts. (g) Design methods and details for their implementation, for example, software data loading, user modifiable software, or multiple-version dissimilar software. (h) Partitioning methods and means of preventing partitioning breaches. (i) Descriptions of the software components, whether they are new or previously developed, with reference to the baseline from which they were taken. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Development: DO-178B Requirements for Design Descriptions (j) Derived requirements from the software design process. (k) If the system contains deactivated code, a description of the means to ensure that the code cannot be enabled in the target computer. (l) Rationale for those design decisions that are traceable to safety-related system requirements. \slideitem{ Deactivated code (k) (see Ariane 5). \slideitem{ Traceability issues interesting (l). \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Development: DO-178B Key Issues \slideitem{ Traceability and lifecycle focus. \slideitem{ Designated engineering reps. \slideitem{ Recommended practices. \slideitem{ Design verification: - formal methods "alternative" only; - "inadequate maturity"; - limited applicability in aviation. \slideitem{Design validation: - use of independent assessors etc. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} DO-178B - NASA GCS Case Study \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} DO-178B - NASA GCS Case Study NASA Langley Research Centre. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} DO-178B - NASA GCS Case Study NASA Langley Research Centre. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} DO-178B - NASA GCS Case Study \slideitem{Project compared: - faults found in statistical tests; - faults found in 178B development. \slideitem{Main conclusions: - such comparisons very difficult; - DO-178B hard to implement; - lack of materials/examples. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Development: DO-178B Practitioners' View The difficulties that have been identified are the DO-178 requirements for evidence and rigorous verification... Systematic records of accomplishing each of the objectives and guidance are necessary. A documentation trail must exist demonstrating that the development processes not only were carried out, but also were corrected and updated as necessary during the program life cycle. Each document, review, analysis, and test must have evidence of critique for accuracy and completeness, with criteria that establishes consistency and expected results. This is usually accomplished by a checklist which is archived as part of the program certification records. The degree of this evidence varies only by the safety criticality of the system and its software. Original source on http://stsc.hill.af.mil/crosstalk/1998/oct/schad.asp \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Development: DO-178B Practitioners' View ...Engineering has not been schooled or trained to meticulously keep proof of the processes, product, and verification real-time. The engineers have focused on the development of the product, not the delivery. In addition, program durations can be from 10 to 15 years resulting in the software engineers moving on by the time of system delivery. This means that most management and engineers have never been on a project from "cradle-to-grave." Original source on http://stsc.hill.af.mil/crosstalk/1998/oct/schad.asp \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Hardware Design: Fault Tolerant Architectures \slideitem{The basics of hardware management. \slideitem{Fault models. \slideitem{Hardware redundancy. \slideitem{Space Shuttle GPC Case Study. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Parts Management Plan \slideitem{MIL-HDBK-965 - help on hardware acquisition. \slideitem{ General dependability requirements. \slideitem{ Not just about safety. \slideitem{But often not considered enough... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} The Basics: Hardware Management \slideitem{MIL-HDBK-965 Acquisition Practices for Parts Management \slideitem{ Preferred Parts List \slideitem{ Vendor and Device Selection \slideitem{ Critical Devices, Technologies & Vendors \slideitem{ Device Specifications \slideitem{ Screening \slideitem{ Part Obsolescence \slideitem{ Failure Reporting, Analysis and Corrective Action (FRACAS) \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} The Basics: Hardware Management Some consequences of designing equipment without a PPL are: \slideitem{ Proliferation of non-preferred parts and materials with identical functions \slideitem{ Increased need for development and preparation of engineering justification for new parts and materials \slideitem{ Increased need for monitoring suppliers and inspecting/screening parts and materials \slideitem{ Selection of obsolete (or potentially obsolete) and sole-sourced parts and materials \slideitem{ Possibility of diminishing sources \slideitem{ Use of unproven or exotic technology ("beyond" state-of-the-art) \slideitem{ Incompatibility with the manufacturing process \slideitem{ Inventory volume expansion and cost increases \slideitem{ Increasing supplier base and audit requirements \slideitem{ Loss of "ship-to-stock" or "just-in-time" purchase opportunities \slideitem{ Limited ability to benefit from volume buys \slideitem{ Increased cost and schedule delays \slideitem{ Nonavailability of reliability data \slideitem{ Additional tooling and assembly methods may be required to account for the added variation in part characteristics \slideitem{ Decreased part reliability due to the uncertainty and lack of experience with new parts \slideitem{ Impeded automation efforts due to the added variability of part types \slideitem{ Difficulty in monitoring vendor quality due to the added number of suppliers \slideitem{ More difficult and expensive logistics support due to the increased number of part types that must be spared. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} The Basics: Hardware Management Must consider during hardware acquisition: \slideitem{ Operating Temperature Range - parts should be selected which are rated for the operating temperature range to which they will be subjected. \slideitem{ Electrical Characteristics - parts should be selected to meet EMI, frequency, waveform and signal requirements and maximum applied electrical stresses (singularly and in combination). \slideitem{ Stability - parts should be selected to meet parameter stability requirements based on changes in temperature, humidity, frequency, age, etc. \slideitem{ Tolerances - parts should be selected that will meet tolerance requirements, including tolerance drift, over the intended life. \slideitem{ Reliability - parts should be selected with adequate inherent reliability and properly derated to achieve the required equipment reliability. Dominant failure modes should be understood when a part is used in a specific application. \slideitem{ Manufacturability - parts should be selected that are compatible with assembly manufacturing process conditions. \slideitem{ Life - parts should be selected that have "useful life" characteristics (both operating and storage) equal to or greater than that intended for the life of the equipment in which they are used. \slideitem{ Maintainability - parts should be selected that consider mounting provisions, ease of removal and replacement, and the tools and skill levels required for their removal/ replacement/repair. \slideitem{ Environment - parts should be selected that can operate successfully in the environment in which they will be used (i.e., temperature, humidity, sand and dust, salt atmosphere, vibration, shock, acceleration, altitude, fungus, radiation, contamination, corrosive materials, magnetic fields, etc.). \slideitem{ Cost - parts should be selected which are cost effective, yet meet the required performance, reliability, and environmental constraints, and life cycle requirements. \slideitem{ Availability - parts should be selected which are readily available, from more than one source, to meet fabrication schedules, and to ensure their future availability to support repairs in the event of failure. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Types of Faults \slideitem{Design faults: - erroneous requirements; - erroneous software; - erroneous hardware. \slideitem{These are systemic failures; - not due to chance but design. \slideitem{Dont forget management/regulators! \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Types of Faults \slideitem{Intermittent faults: - fault occurs and recurrs over time; - fault connections can recur. \slideitem{Transient faults: - fault occurs but may not recurr; - electromagnetic interference. \slideitem{Permanent faults: - fault persists; - physical damage to processor. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Models \slideitem{Single stuck-at models. \slideitem{Hardware seen as `black-box'. \slideitem{Fault modelled as: - input or output error; - stuck at either 1 or 0. \slideitem{Models permanent faults. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Models - Single Stuck-At... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Models \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Models \slideitem{Bridging Model: - input not `stuck-at' 1 or 0; - but shorting of inputs to circuit; - input then is wired-or/wired-and. \slideitem{Stuck-open model: - both CMOS output transistors off; - results is neither high nor low... \slideitem{Transition and function models. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Software Faults (Aside...) \slideitem{Much more could be said... - see Leveson or Storey. \slideitem{Huge variability: - specification errors; - coding errors; - translation errors; - run-time errors... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Redundancy \slideitem{ Adds: - cost; - weight; - power consumption; - complexity (most significant). \slideitem{These can outweigh safety benefits. \slideitem{Other techniques available: - improved maintenance; - better quality materials; \slideitem{Sometimes no choice (Satellites). \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Hardware Redundancy Techniques \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Active Redundancy \slideitem{ When component fails... \slideitem{ Redundant components do not have: - to detect component failure; - to switch to redundant resource. \slideitem{ Redundant units always operate. \slideitem{ Automatically pick up load on failure. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Standby Redundancy \slideitem{ Must detect failure. \slideitem{Must decide to replace component. \slideitem{Standby units can be operating. \slideitem{Stand-by units may be brought-up. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Example Redundancy Techniques Bimodal Parallel/Series Redundancy. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Triple Modular Redundancy (TMR) \slideitem{ Possibly most widespread. \slideitem{In simple voting arrangement, - voting element -> common failure; - so triplicate it as well. \slideitem{ Multi-stage TMR architectures. \slideitem{More cost, more complexity... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Multilevel Triple Modular Redundancy (TMR) \slideitem{ No protection if 2 fail per level. \slideitem{No protection from common failure - eg if hard/software is duplicated. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Fault Detection \slideitem{Functionality checks: - routines to check hardware works. \slideitem{Signal Comparisons: - compare signal in same units. \slideitem{Information Redundancy: - parity checking, M out of N codes... \slideitem{Watchdog timers: - reset if system times out. \slideitem{Bus monitoring: - check processor is `alive'. \slideitem{Power monitoring: - time to respond if power lost. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Space Shuttle General Purpose Computer (GPC) Case Study "GPCs running together in the same GN&C (Guidance, Navigation and Control) OPS (Operational Sequence) are part of a redundant set performing identical tasks from the same inputs and producing identical outputs. Therefore, any data bus assigned to a commanding GN&C GPC is heard by all members of the redundant set (except the instrumentation buses because each GPC has only one dedicated bus connected to it). These transmissions include all CRT inputs and mass memory transactions, as well as flight-critical data. Thus, if one or more GPCs in the redundant set fail, the remaining computers can continue operating in GN&C. Each GPC performs about 325,000 operations per second during critical phases. " \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Space Shuttle General Purpose Computer (GPC) Case Study GPC status information among the primary avionics computers. If a GPC operating in a redundant set fails to meet two redundant multiplexer interface adapter receiver during two successive reads of response data and does not receive any data while the other members of the redundant set do not receive the data, they in turn will vote the GPC out of the set. A failed GPC is halted as soon as possible." \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Space Shuttle General Purpose Computer (GPC) Case Study "GPC failure votes are annunciated in a number of ways. The GPC status matrix on panel O1 is a 5-by-5 matrix of lights. For example, if GPC 2 sends out a failure vote against GPC 3, the second white light in the third column is illuminated. The yellow diagonal lights from upper left to lower right are self-failure votes. Whenever a GPC receives two or more failure votes from other GPCs, it illuminates its own yellow light and resets any failure votes that it made against other GPCs (any white lights in its row are extinguished). Any time a yellow matrix light is illuminated, the GPC red caution and warning light on panel F7 is illuminated, in addition to master alarm illumination, and a GPC fault message is displayed on the CRT. " \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Space Shuttle General Purpose Computer (GPC) Case Study "Each GPC power on , off switch is a guarded switch. Positioning a switch to on provides the computer with triply redundant normally, even if two main or essential buses are lost. " \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Space Shuttle General Purpose Computer (GPC) Case Study "(There are) 5 identical general-purpose computers aboard the orbiter control space shuttle vehicle systems. Each GPC is composed of two separate units, a central processor unit and an input/output processor. All five GPCs are IBM AP -101 computers. Each CPU and IOP contains a memory area for storing software and data. These memory areas are collectively re ferred to as the GPC's main memory. The IOP of each computer has 24 independent processors, each of which controls 24 data buses use d to transmit serial digital data between the GPCs and vehicle systems, and secondary channels between the telemetry system a nd units that collect instrumentation data. The 24 data buses are connected to each IOP by multiplexer interface adapt ers that receive, convert and validate the serial data in response to discrete signals calling for available data to be transmitted or received from vehicle hardware." \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Space Shuttle General Purpose Computer (GPC) Case Study "A GPC on orbit can also be ''freeze-dried;'' that is, it can be loaded with the software for a particular memory configuration and then moded to standby. It can then be moded to halt and powered off. Since the GPCs have non-volatile memory, the software is retained. Before an OPS transition to the loaded memory configuration, the freeze-dried GPC can be moded back to run and the appropriate OPS requested. A failed GPC can be hardware-initiated, stand-alone-memory-dumped by switching the powered computer to terminate and halt and then selecting the number of the failed GPC on the GPC memory dump rotary switch on panel M042F in the crew \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Space Shuttle General Purpose Computer (GPC) Case Study "A simplex GPC is one in run and not a member of the redundant set, such as the BFS (Backup Flight System) GPC. Systems management and payload major functions are always in a simplex GPC." "Even though the four primary avionics software system GPCs control all GN&C functions during the critical phases of the mission, there is always a possibility that a generic failure could cause loss of vehicle control. Thus, the fifth GPC is loaded with different software created by a different company than the PASS developer. This different software is the backup flight system. To take over control of the vehicle, the BFS monitors the PASS GPCs to keep track of the current state of the vehicle. If required, the BFS can take over control of the vehicle upon the press of a button. The BFS also performs the systems management functions during ascent and entry because the PASS GPCs are operating in GN&C. BFS software is always loaded into GPC 5 before flight, but any of the five GPCs could be made the BFS GPC if necessary." \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Hardware Design: Fault Tolerant Architectures \slideitem{The basics of hardware management. \slideitem{Fault models. \slideitem{Hardware redundancy. \slideitem{Space Shuttle GPC Case Study. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Hardware Implementation Issues \slideitem{COTS Microprocessors. \slideitem{Specialist Microprocessors. \slideitem{Programmable Logic Controllers \slideitem{Electromagnetic Compatability \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} COTS Microprocessors \slideitem{As we have seen: - safety of software jeopardised - if flaws in underlying hardware. \slideitem{Catch-22 problem: - best tools for COTS processors; - most experience with COTS; - least assurance with COTS... \slideitem{Redundancy techniques help... - but danger of common failures; - vs cost of heterogeneity; \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} COTS Microprocessors \slideitem{Where do the faults arise? 1. fabrication failures; 2. microcode errors; 3. documentaiton errors. \slideitem{Can guard against 1: - using same processing mask; - tests then apply to all of batch; - high cost (specialist approach). \slideitem{Cannot distinguish 2 from 3? \slideitem{Undocumented instructions... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} COTS Microprocessors "Steven O. Siegfried" Mon, 10 Nov 1997 01:07:34 -0600 (CST) New Intel Pentium risk: user mode program locks up system The following program, when compiled and run in __USER__ mode on any Pentium (reported as MMX or not, don't know about Pentium II yet) will lock-up the system. > char x [5] = { 0xf0, 0x0f, 0xc7, 0xc8 }; > main () > { > void (*f)() = x; > f(); > } Any user can execute this program at the lowest level of security provided by the following operating systems: OS/2, NT, W95, Linux. When I tried it, I could _only_ recover by power-cycling my box. The following perl script, courtesy of Sam Trenholme via the security mailing list at Redhat Software is reported to find _all_ occurences of this code sequence on systems running Linux. (It found my bomb program after I used it to kill my system as a test.) It can probably be adapted for use on other operating systems. > #!/usr/bin/perl > # Source: Sam Trenholme via linux-security@redhat.com mailing list. > # There is no known software fix to the F0 0F C7 C8 bug at this time. > # usage: $0 dir > # Where dir is the directory you recursively look at all programs in > # for instances of the F0 0F C7 C8 sequence. > # This script will search for programs with this sequence, which will > # help sysadmins take appropriate action against those running such > # programs. > # This script is written (but has not been tested) in Perl4, to > # insure maximum compatibility . > sub findit { > local($dir,$file,@files,$data) = @_; > undef $/; > if(!opendir(DIR,$dir)) { > print STDERR "Can not open $dir: $!\n"; > return 0; > } > @files=readdir(DIR); > foreach $file (@files) { > if($file ne '.' && $file ne '..') { > if( -f "$dir/$file" && open(FILE,"< $dir/$file")) { > $data=; > if($data =~ /\xf0\x0f\xc7\xc8/) { > print "$dir/$file contains F0 0F C7 C8\n"; > } > } elsif( -d "$dir/$file") { > &findit("$dir/$file"); > } > } > } > } > $dir = shift || '/home'; > > &findit($dir); Basically, there's no protection from this. Adjust your execution of downline loaded absolutes accordingly. Steve Siegfried sos@skypoint.com sos1@xtl.msc.edu \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} COTS Microprocessors "Modern microprocessor chips are getting very complex indeed. The current gate count can exceed 2.5 million. One must therefore expect that new versions of such chips will contain logical bugs. A common form of bug is in the microcode, but since the distinction between a microcode fault and another form of design bug is difficult to define, the distinction is not made here. We are *not* concerned with fabrication faults." \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} COTS Microprocessors "Attempts to report bugs openly have not been successful. A consequence of the above is that it is very difficult of users undertaking a critical application to protect themselves against a potential design bug. One approach that has been tried with one project is to use identical chips from the same mask so that rig and development testing will extrapolate to the final system. In some cases, the suppliers have provided information under a non-disclosure agreement, be this seems to be restricted to major projects. In contrast, quite a few software vendors have an open bug reporting scheme --- and almost all provide a version number to the user. Hence it appears in this area, software is in `advance' of hardware." \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} COTS Microprocessors "The key issues extracted are as follows: \slideitem{Early chips are unreliable: There have been some dramatic errors in very early releases of chips. \slideitem{Rarely used instructions are unreliable: One report sent to me reported that some instructions not generated by the `C' compiler were completely wrong. Another report noted that special instructions for 64-bit integers did not work, and when this was reported, the supplier merely removed them from the documentation! \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} COTS Microprocessors "The key issues extracted are as follows (continued): \slideitem{Undocumented instructions are unreliable: Obviously, such instructions must be regarded with suspicion. \slideitem{Exceptional case handling is unreliable: A classic instance of this problem is an error which has been reported to me several times of the jump instructions on the 6502. When such an instruction straddled a page boundary, it did not work correctly. This issue potentially gives the user most cause for concern, since it may be very difficult to avoid the issue. For instance, with machine generated code form a compiler, the above problem with the 6502 would be impossible to avoid." \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Specialist Microprocessors \slideitem{ Commercial microprocessor flaws. \slideitem{ What happens if illegal opcode? - or result may be undefined? \slideitem{Motorola 6801 test instruction - fetches infinite bytes from memory; - good to test for faults on bus; - but could be executed erroneously; - see Storey or comp.risks for more. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Specialist Microprocessors - AAMP2 \slideitem{ Collins Avionics/Rockwell group. \slideitem{ AAMP2 - 30+ in every Boeing 747-400. \slideitem{High criticality implies cost - can you sell enough to cover input? \slideitem{What is money spent on? - extra time spent on design? - bench testing (see later); - formal verification... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Specialist Microprocessors - AAMP5 "The AAMP5 verification was a project conducted to explore how formal techniques for specification and verification could be introduced into an industrial process. Sponsored by the Systems Validation Branch of NASA Langley and Collins Commercial Avionics, a division of Rockwell International, it was conducted by Collins and the Computer Science Research Lab at SRI International. The project consisted of specifying in the PVS language developed by SRI a portion of a Rockwell proprietary microprocessor, the AAMP5, at both the instruction set and register-transfer levels and using the PVS theorem prover to show the microcode correctly implemented the specified behavior for a representative subset of instructions. The formal verification was performed in parallel with the development of the AAMP5 and did not replace any production verification activities." \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Specialist Microprocessors - AAMP5 " This methodology was used to formally verify a core set of eleven AAMP5 instructions representative of several instruction classes. The core set did not include floating point instructions. Although the number of instructions verified is small, the methodology and the formal machinery developed are adequate to cover most of the remaining AAMP5 microcode. The success of this project has lead to a sequel in which the same methodology is being reused to verify another member of the AAMP family of processors" \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Specialist Microprocessors - AAMP5 "Another key result was the discovery of both actual and seeded errors. Two actual microcode errors were discovered during development of the formal specification, illustrating the value of simply creating a precise specification. Both errors were specific to the AAMP5 and corrected prior to first fabrication. Two additional errors seeded by Collins in the microcode were systematically uncovered by SRI while doing correctness proofs. One of these was an actual error that had been discovered by Collins during testing of an early prototype but left in the microcode provided to SRI. The other or simulation." \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Specialist Microprocessors - AAMP5 and Formal Verification \slideitem{For more details see: \slideitem{In principle, it can be done, - but still very expensive; - need techniques and tools; - reduce costs and increase subsets. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Specialist Microprocessors - Verifiable Integrated Processor for Enhanced Reliability (VIPER) \slideitem{This is an old story... - but still very controversial. \slideitem{Royal Signals & Radar Establishment. \slideitem{LCF-LSM and Ella. \slideitem{Big claims about confidence levels: - did MOD claim "fully proven"? - proof from spec to production? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Specialist Microprocessors - Verifiable Integrated Processor for Enhanced Reliability (VIPER) \slideitem{Charter technologies market the chip. \slideitem{Sue MOD over "ungrounded" claims. \slideitem{Charter into liquidation as costs rise. \slideitem{Key lesson: - general ignorance about proof; - argument not absolute guarantee. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Specialist Microprocessors - 1750A, 1750B - introduced in 1979; - revised in 1982; - deactivated in 1996; - well documented/understood. - but dont forget safety of language; - not just processor reliability. - started but never completed? - 1750A remains a de facto standard. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Specialist Microprocessors - ERC32 \slideitem{Reliable not safety-critical? \slideitem{Space and radiation tolerant. \slideitem{Ada development tools. - integer unit (IU); - floating-point unit (FPU); - memory controller (MEC). \slideitem{Single-chip (TSC695E) June 1999. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Microprocessors \slideitem{MIL-STD1750 - expensive. \slideitem{Select processor for application. - Storey cites widespread use; - range of less critical areas. - specifically for airbag applications; - "general purpose in-system programmable microcontrollers". \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Programmable Logic Controllers (PLCs) \slideitem{Self contained: - power supply; - interface circuitry; - 1+ processors. \slideitem{Different from GPCs (eg Shuttle): - replace electromechanical relays; - perform simple logic functions. \slideitem{Designed for high MTBFs: - kernels provide trusted functions; - proprietary source for firmware. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Programmable Logic Controllers (PLCs) \slideitem{Widely used, well tested \slideitem{But hard/software proprietary. \slideitem{Certification by trusted bodies. \slideitem{However... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Programmable Logic Controllers (PLCs) Problem with PLC Software "Lin Zucconi" 3 Mar 1993 16:50:50 U People using Modicon 984 Series programmable controllers with Graysoft Programmable Logic Controller (PLC) software Version 3.21 are advised to contact Graysoft (414) 357-7500 to receive the latest version (3.50) of the software. A bug in Version 3.21 can corrupt a controller's logic and cause equipment to operate erratically. PLCs are frequently used in safety-related applications. Users often assume that if their "logic" is correct then they are ok and forget that the underlying logic is implemented with software which may not be correct. Lin Zucconi zucconi@llnl.gov \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Programmable Logic Controllers (PLCs) \slideitem{PC emulators to develop software: - download to target PLC; - volaile store is dangerous; - wide use of EEPROMS. \slideitem{Fail safe PLC's: - two or more independent CPU's; - voting forms of redundancy; - if conflict close down in safe state \slideitem{Several graphical design techniques: ladder & function block diagrams... \slideitem{See Storey for more detail. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Electromagnetic Compatability (EMC) \slideitem{Work in presence of interference. \slideitem{AND not create interference. \slideitem{Interference from external noise \slideitem{Interference from external source 2 radio signals on same frequency. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Electromagnetic Compatability (EMC) \slideitem{Difficult to predict. \slideitem{Intensity changes over time; - eg with work patterns; \slideitem{Sources may also be mobile; - or you may be mobile! \slideitem{Mobile telephones, car ignitions... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Electromagnetic Compatability (EMC) \slideitem{Protection. \slideitem{Screening: - use conductive cage/enclosure. \slideitem{Check design of PCBs if possible: - (power) loops form antennas; - check use of ground planes. \slideitem{Check CMOS output capacitance; - can buy chips (Philips 8051); - help discriminate signal edges. \slideitem{Seek help from a specialist... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Hardware Implementation Issues \slideitem{COTS Microprocessors. \slideitem{Specialist Microprocessors. \slideitem{Programmable Logic Controllers \slideitem{Electromagnetic Compatability \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Introduction \slideitem{ Validation and Verification. \slideitem{ What are the differences? \slideitem{When, why and who? \slideitem{UK MOD DEF STAN 00-66 \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Definitions and Distinctions \slideitem{ Verification: - does it meet the requirements? \slideitem{ Validation: - are the requirements any good? \slideitem{ Testing: - process used to support V&V. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Definitions and Distinctions B.5.3.6 Verification and Validation. This sub-process evaluates the products of other Software Modification sub-processes to determine their compliance and consistency with both contractual and local standards and higher level products and requirements. Verification and validation consists of software testing, traceability, coverage analysis and confirmation that required changes to software documentation are made. Testing subdivides into unit testing, integration testing, regression testing, system testing and acceptance testing. Acknowledgement: Integrated Logistic Support: Part 3, Guidance for Software Support. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Definitions and Distinctions \slideitem{Misuse of terms? A. The certification/validation process should confirm that hazards identified by hazard analysis, (HA), failure mode effect analysis (FMEA), and other system analyses have been eliminated by design or devices, or special procedures. The certification/validation process should also confirm that residual hazards identified by operational analysis are addressed by warning, labeling safety instructions or other appropriate means. Acknowledgement: Nonmandatory guidelines for certification/validation of safety systems for presence sensing device initiation of mechanical power presses - 1910.217 App B \slideitem{More like verification? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Validation \slideitem{ During design - external review before commission; - external review for certification. \slideitem{ During implementation: - additional constraints discovered; - additional requirements emmerge. \slideitem{ During operation: - were the assumptions valid? - especially environmental factors. \slideitem{Validate: - PRA's; development processes etc. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Validation: Waterfall Model \slideitem{Validation at start and end. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Validation: Spiral Model \slideitem{Validation more continuous. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Validation: IEC 61508 (Draft) The following should be considered in an overall safety validation plan: \slideitem{ Details of when the validation should take place. \slideitem{ Details of who should carry out the validation. \slideitem{ Identification of the relevant modes of the system operation, including: \slideitem{ preparation for use, including setting up and adjustment \slideitem{ start up \slideitem{ teach \slideitem{ automatic \slideitem{ manual \slideitem{ semi-automatic \slideitem{ steady-state operation \slideitem{ resetting \slideitem{ shutdown \slideitem{ maintenance \slideitem{ reasonably foreseeable abnormal conditions \slideitem{ Identification of the safety-related systems and external risk reduction facilities that need to be validated for each mode of the system before commissioning commences. \slideitem{ The technical strategy for the validation, for example, whether analytical methods or statistical tests are to be used. \slideitem{ The measures, techniques and procedures that shall be used to confirm that each safety function conforms with the overall safety requirements documents and the safety integrity requirements. \slideitem{ The specific reference to the overall safety requirements documents. \slideitem{ The required environment in which the validation activities are to take place. \slideitem{ The pass/fail criteria. \slideitem{ The policies and procedures for evaluating the results of the validation, particularly failures. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Validation: MOD DEF STAN 00-60 D.4.1.6 Validation. At the earliest opportunity support resource requirements should be confirmed and measurements should be made of times for completion of all software operation and support tasks. Where such measurements are dependent upon the system state or operating conditions, averages should be determined over a range of conditions. If measurements are based on non-representative hardware or operating conditions, appropriate allowances should be made and representative measurements carried out as soon as possible. The frequency of some software support tasks will be dependent upon the frequency of software releases and the failure rate exhibited by the software. Integrated Logistic Support: Part 3, Guidance for Software Support. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Validation: MOD DEF STAN 00-60 D.4.1.6 Validation. (Cont.) Measurements of software failure rates and fault densities obtained during software and system testing might not be representative of those that will arise during system operation. However, such measurements may be used, with caution, in the validation of models and assumptions. For repeatable software engineering activities, such as compilation and regression testing, the time and resource requirements that arose during development should be recorded. Such information may be used to validate estimates for equivalent elements of the software modification process. For other software engineering activities, such as analysis, design and coding, the time and resource requirements that arose during development should be recorded. However, such information should only be used with some caution in the validation of estimates for equivalent elements of the software modification process. The preceding clauses might imply the need for a range of metrics Integrated Logistic Support: Part 3, Guidance for Software Support. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Validation: Summary of Key Issues \slideitem{ Who validates validator? - External agents must be approved. \slideitem{ Who validates validation? - Clarify links to certification. \slideitem{What happens if validation fails? - Must have feedback mechanisms; - Links to process improvement? \slideitem{NOT the same as verification! \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Verification: Leveson's Strategies \slideitem{Show that functional requirements - are consistent with safety criteria ? \slideitem{Implementation may include hazards not in safety/functional requirements. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Verification: Leveson's Strategies \slideitem{Show that implementation is - same as functional requirements? \slideitem{Too costly and time consuming all safety behaviour in specification? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Verification: Leveson's Strategies \slideitem{Or show that the implementation - meets the safety criteria. \slideitem{Fails if criteria are incomplete... - but can find specification errors. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Verification: Lifecycle View \slideitem{At several stages in waterfall model. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Verification: Lifecycle View \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Verification \slideitem{Verification as a catch-all? "Verification is defined as determining whether or not the products of each phase of the software development process fulfills all the requirements from the previous phase." \slideitem{So a recurrent cost, dont forget... - verification post maintenance. \slideitem{Verification supported by: - determinism (repeat tests); - separate safety-critical functions; - well defined processes; - simplicity and decoupling. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Verification D.5.1 Task 501 Supportability Test, Evaluation and Verification D.5.1.1 Test and Evaluation Strategy. Strategies for the evaluation of system supportability should include coverage of software operation and software support. Direct measurements and observations may be used to verify that all operation and support activities - that do not involve design change - may be completed using the resources that have been allocated. During the design and implementation stage measurements may be conducted on similar systems, under representative conditions. As software modification activity is broadly similar to software development the same monitoring mechanism might be used both pre- and post-implementation. Such a mechanism is likely to be based on a metrics programme that provides information, inter alia, on the rate at which software changes are requested and on software productivity. Integrated Logistic Support: Part 3, Guidance for Software Support. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Verification D.5.1.3 Objectives and Criteria. System test and evaluation programme objectives should include verification that all operation and support activities may be carried out successfully -within skill and time constraints - using the PSE and other resources that have been defined. The objectives, and associated criteria, should provide a basis for assuring that critical software support issues have been resolved and that requirements have been met within acceptable confidence levels. Any specific test resources, procedures or schedules necessary to fulfil these objectives should be included in the overall test programme. Programme objectives may include the collection of data to verify assumptions, models or estimates of software engineering productivity and change traffic. Integrated Logistic Support: Part 3, Guidance for Software Support. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Verification D.5.1.4 Updates and Corrective Actions. Evaluation results should be analyzed and corrective actions determined as required. Shortfalls might arise from: \slideitem{ Inadequate resource provision for operation and support tasks. \slideitem{ Durations of tasks exceeding allowances. \slideitem{ Software engineering productivity not matching expectations. \slideitem{ Frequencies of tasks exceeding allowances. \slideitem{ Software change traffic exceeding allowances. Corrective actions may include: increases in the resources available; improvements in training; additions to the PSE or changes to the software, the support package or, ultimately, the system design. Although re-design of the system or its software might deliver long term benefits it would almost certainly lead to increased costs and programme slippage. Integrated Logistic Support: Part 3, Guidance for Software Support. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Verification: Summary of Key Issues \slideitem{What can we affoard to verify? \slideitem{ Every product of every process? - MIL HDBK 338B... \slideitem{ Or only a few key stages? \slideitem{If the latter, do we verify : - specification by safety criteria? - implementation by safety criteria? - or both... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Verification: Summary of Key Issues \slideitem{Above all.... \slideitem{Verification is about proof. \slideitem{Proof is simply an argument. \slideitem{Argument must be correct but - not a mathematical `holy grail'... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Introduction \slideitem{ Validation and Verification. \slideitem{ What are the differences? \slideitem{When, why and who? \slideitem{UK MOD DEF STAN 00-66 \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Testing \slideitem{ The processes used during: - validation and verification. \slideitem{White and black boxes. \slideitem{Static and Dynamic techniques \slideitem{Mode confusion case study. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Definitions and Distinctions \slideitem{ Black box tests: - tester has no access to information - about the system implementation. \slideitem{Good for independence of tester. \slideitem{But not good for formative tests. \slideitem{Hard to test individual modules... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Definitions and Distinctions \slideitem{ White box tests: - tester can access information about - the system implementation. \slideitem{Simplifies diagnosis of results. \slideitem{Can compromise independence? \slideitem{How much do they need to know? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Definitions and Distinctions \slideitem{ Module testing: - tests well-defined subset. \slideitem{ Systems integration: - tests collections of modules. \slideitem{ Acceptance testing: - system meets requirements? \slideitem{Results must be documented. \slideitem{Changes will be costly. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Dynamic Testing - Process Issues \slideitem{ Functional testing: - test cases examine functionality; - see comments on verification. \slideitem{ Structural testing: - knowledge of design guides tests; - interaction between modules... - test every branch (coverage)? \slideitem{ Random testing: - choose from possible input space; - or beyond the "possible"... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Definitions and Distinctions \slideitem{ Dynamic testing: - execution of system components; - is environment being controlled? \slideitem{ Static testing: - investigation without operation; - pencil and paper reviews etc. \slideitem{Most approaches use both. \slideitem{Guide the test selection by using: - functional requirements: - safety requirements; - (see previous lecture). \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Definitions and Distinctions \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Dynamic Testing \slideitem{ Where do you begin? \slideitem{Look at the original hazard analysis; - demonstrate hazard elimination? - demonstrate hazard reduction? - demonstrate hazard control? \slideitem{Must focus both on: - expected and rare conditions. \slideitem{PRA can help - but for software? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Dynamic Testing - Leveson's Process Issues \slideitem{ Review test plans. \slideitem{ recommend tests based on the hazard analyses, safety standards and checklists, previous accident and incidents, operator task analyses etc. \slideitem{ Specify the conditions under which the test will be conducted. \slideitem{ Review the test results for any safety-related problems that were missed in the analysis or in any other testing. \slideitem{ Ensure that the testing feedback is integrated into the safety reviews and analyses that will be used in design modifications. \slideitem{ All of this will cost time and money. \slideitem{ Must be planned, must be budgeted. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Dynamic Testing Techniques \slideitem{ Partitioning: - identify groups of input values; - do they map to similar outputs? \slideitem{ Boundary analysis: - extremes of valid/invalid input. \slideitem{ Probabilistic Testing: - examine reliability of system. \slideitem{ (State) Transition tests: - trace states, transitions and events. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Dynamic Testing Techniques \slideitem{ Simulation: - assess impact on EUC (IEC61508). \slideitem{ Error seeding: - put error into implementation; - see is test discover it (dangerous). \slideitem{ Performance monitoring: - check real-time, memory limits. \slideitem{ Stress tests: - abnormally high workloads? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Dynamic Testing - Software Issues \slideitem{ Boundary conditions. \slideitem{ Incorrent and unexpected inputs sequences. \slideitem{ Altered timings - delays and over-loading. \slideitem{ Environmental stress - faults and failures. \slideitem{ Critical functions and variables. \slideitem{ Firewalls, safety kernels and other special safety features. \slideitem{ Usual suspects...automated tests? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Limitations of Dynamic Testing \slideitem{ Cannot test all software paths. \slideitem{Cannot even text all hardware faults. \slideitem{Not easy to test in final environment: \slideitem{User interfaces very problematic:
- effects of fatigue/longitudinal use? - see section on human factors. \slideitem{Systems CHANGE the environment! \slideitem{How can we test for rare events? - may have to wait 10^{9} years?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Static Testing \slideitem{Dont test the system itself. \slideitem{ Test an abstraction of the system \slideitem{Perform checks on requirements? \slideitem{Perform checks on static code. \slideitem{Scope depends on representation... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Static Testing Techniques - peer review by other engineers. \slideitem{ Fagan inspections: - review of design documents. \slideitem{ Symbolic execution: - use term-rewriting on code; - does code match specification? \slideitem{ Metrics: - lots (eg cyclomatic complexity); - most very debatable... Static Testing Techniques \slideitem{Sneak Circuit Analysis: - find weak patterns in topologies; - for hardware not software. \slideitem{Software animation: - trace behaviour of software model; - Petri Net animation tools. \slideitem{Performance/scheduling theory: - even if CPU scheduling is static; - model other resource allocations. \slideitem{Formal methods: - considerable argument even now; - compare 00-60 with DO-178B... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Formal Methods: The Mode Confusion Case Study \slideitem{Recent, novel use formal analysis. \slideitem{To guide/direct other testing. \slideitem{The mode confusion problem... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Formal Methods: Mode Confusion Case Study The Flight Guidance System (FGS) compares the measured state of an aircraft (position, speed, and attitude) to the desired state and generates pitch and roll guidance commands to minimize the difference between the measured and desired state. When engaged, the Autopilot (AP) translates these commands into movement of the aircrafts control surfaces necessary to achieve the commanded changes about the lateral and vertical axes. An FGS can be further broken down into the mode logic and the flight control laws. The mode logic accepts commands from the flight crew, the Flight Management System (FMS), and information about the current state of the aircraft to determine which system modes are active. The active modes in turn determine which flight control laws are used to generate the pitch and roll guidance commands. The active lateral and vertical modes are displayed (an Instrumentation System (EFIS)). \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Formal Methods: Mode Confusion Case Study "../reports/butler-etal-dasc98.pdf">A Formal Methods Approach to the Analysis of Mode Confusion. In AIAA/IEEE Digital Avionics Systems Conference, October , 1998. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Formal Methods: Mode Confusion Case Study 1. Opacity (i.e., poor display of automation state), 2. Complexity (i.e., unnecessarily complex automation), 3. incorrect mental model (i.e., the flight crew misunderstands the behaviourr of the automation).

Traditional human factors has concentrated on (1), and made significant progress has been made. However, mitigation of mode confusion will require addressing problem sources (2) and (3) as well. Towards this end, our approach uses two complementary strategies based upon a formal model: Visualisation Create a clear, executable model of the automation that is easily understood by flight crew and use it to drive a flight deck mockup from the formal model Analysis Conduct mathematical analysis of the model. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Formal Methods: Mode Confusion Case Study \slideitem{Problems stemming from modes: - input has different effect; - uncommanded mode changes; - different modes->behaviours; - different intervention options; - poor feedback. \slideitem{ObjectTime visualisation model... \slideitem{Represent finite state machines. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Formal Methods: Mode Confusion Case Study \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Formal Methods: Mode Confusion Case Study The state of the Flight Director (FD), Autopilot (AP), and each of the lateral and vertical modes are modeled as In Figure 3 (see previous slide), the FD is On with the guidance cues displayed; the AP is Engaged; lateral Roll, Heading, and Approach modes are Cleared; lat-eral NAV mode is Armed; vertical modes Pitch, Approach, and AltHold are Cleared; and the VS mode is Active. Active modes are those that actually control the aircraft when the AP is en-gaged. These are indicated by the heavy dark boxes around the Active, Track, and lateral Armed modes. "../reports/butler-etal-dasc98.pdf">A Formal Methods Approach to the Analysis of Mode Confusion. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Formal Methods: Mode Confusion Case Study \slideitem{ObjectTime model: - give pilots better mental model? - drive simulation (dynamic tests?). \slideitem{Build more complete FGS model - prove/test for mode problems. \slideitem{ Discrete maths: - theorem proving; - or model checking? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Formal Methods: Mode Confusion Case Study The first problem is formally defining what constitutes an indirect mode change. Lets begin by defining it as a mode change that occurs when there has been no crew input: Indirect_Mode_Change?(s,e): bool = NOT Crew_input?(e) AND Mode_Change?(s,e) No_Indirect_Mode_Change: LEMMA Valid_State?(s) IM\slideitem{S NOT Indirect_Mode_Change?(s,e) "../reports/butler-etal-dasc98.pdf">A Formal Methods Approach to the Analysis of Mode Confusion. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Formal Methods: Mode Confusion Case Study We then seek to prove the false lemma above using GRIND, a brute force proof strategy that works well on lemmas that do not involve quantification. The resulting unproved sequents elaborate the conditions where indirect mode changes occur. For example,

{-1} Overspeed_Event?(e!1)
{-2} OFF?(mode(FD(s!1)))
{-3} s!1 WITH [FD := FD(s!1) WITH [mode := CUES],
LATERAL := LATERAL(s!1) WITH
[ROLL := (# mode := ACTIVE #)],
VERTICAL := VERTICAL(s!1) WITH
[PITCH := (# mode := ACTIVE #)]]
= NS
{-4} Valid_State(s!1)
|-------
{1} mode(PITCH(VERTICAL(s!1))) =
mode(PITCH(VERTICAL(NS)))
The situations where indirect mode changes occur are clear from the negatively labeled formulas in each sequent. We see that an indirect mode change occurs when the overspeed event occurs and the Flight Director is off. This event turns on the Flight Director and places the system into modes ROLL and PITCH. "../reports/butler-etal-dasc98.pdf">A Formal Methods Approach to the Analysis of Mode Confusion. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Formal Methods: Mode Confusion Case Study We define an ignored command as one in which there is a crew input and there is no mode change. We seek to prove that this never happens: No_Ignored_Crew_Inputs: LEMMA Valid_State(s) AND Crew_Input?(e) IM\slideitem{S NOT Mode_Change?(s,e) The result of the failed proof attempt is a set of sequents similar to the following: {-1} VS_Pitch_Wheel_Changed?(e!1) {-2} CUES?(mode(FD(s!1))) {-3} TRACK?(mode(NAV(LATERAL(s!1)))) {-4} ACTIVE?(mode(VS(VERTICAL(s!1)))) |------- {1} ACTIVE?(mode(ROLL(LATERAL(s!1)))) {2} ACTIVE?(mode(HDG(LATERAL(s!1)))) The negatively labeled formulas in the sequent clearly elaborate the case where an input is ignored, i.e., when the VS/Pitch Wheel is changed and the Flight Director is displaying CUES and the active lateral mode is ROLL and the active vertical mode is PITCH. In this way, PVS is used to perform a state exploration to discover all conditions where the lemma is false, i.e., all situations in which a crew input is ignored. "../reports/butler-etal-dasc98.pdf">A Formal Methods Approach to the Analysis of Mode Confusion. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Formal Methods: Mode Confusion Case Study \slideitem{Are these significant for user? \slideitem{Beware: - atypical example of formal methods; - havent mentioned refinement; - havent mentioned implementation; - much more could be said... - see courses on formal methods. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Testing \slideitem{ The processes used during: - validation and verification. \slideitem{White and black boxes. \slideitem{Static and Dynamic techniques \slideitem{Mode confusion case study. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Individual Human Error \slideitem{Slips, Lapses and Mistakes. \slideitem{Rasmussen: Skill, Rules, Knowledge. \slideitem{Reason: Generic Error Modelling. \slideitem{Risk Homeostasis. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} What is Error? \slideitem{Deviation from optimal performance? - very few achieve the optimal. \slideitem{Failure to achive desired outcome? - desired outcome can be unsafe. \slideitem{Departure from intended plan? - but environment may change plan... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} What is Error? Acknowledgement:J. Reason, Human Error, Cambridge University Press, 1990 (ISBN-0-521-31419-4). \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Types of Errors... \slideitem{Slips: - correct plan but incorrect action; - more readily observed. \slideitem{Lapses: - correct plan but incorrect action; - failure of memory so more covert? \slideitem{Mistakes: - incorrect plan; - more complex, less understood. \slideitem{Human error modelling helps to: - analyse/distinguish error types. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Rasmussen: Skill, Rules and Knowledge \slideitem{Skill based behaviour: - sensory-motor performance; - without conscious control; - automated, high-integrated. \slideitem{Rule based behaviour: - based on stored procedures; - induced by experience or taught; - problem solving/planning. \slideitem{Knowledge based behaviour: - in unfamilliar situations; - explicitly think up a goal; - develop a plan by selection; - try it and see if it works. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Rasmussen: Skill, Rules and Knowledge Acknowledgement: J. Rasmussen, Skill, Rules, Knowledge: Signals, Signs and Symbols and Other Distinctions in Human Performance Models. IEEE Transactions on Systems, Man and Cybernetics (SMC-13)3:257-266, 1983. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Rasmussen: Skill, Rules and Knowledge \slideitem{Signals: - sensory data from environment; - continuous variables; - cf Gibson's direct perception. \slideitem{Signs: - indicate state of the environment; - with conventions for action; - activate stored pattern or action. \slideitem{Symbols: - can be formally processed; - related by convention to state. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Rasmussen: Skill, Rules and Knowledge \slideitem{ Skill-based errors: - variability of human performance. \slideitem{ Rule-based errors: - misclassification of situations; - application of wrong rule; - incorrect recall of correct rule. \slideitem{ Knowledge-based errors: - incomplete/incorrect knowledge; - workload and external constraints... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Building on Rasmussen's Work \slideitem{How do we account for: - slips and lapses in SKR? \slideitem{Can we distinguish: - more detailed error forms? - more diverse error forms? \slideitem{Before an error is detected: - operation is, typically, skill based. \slideitem{After an error is detected: - operation is rule/knowledge based. \slideitem{GEMS builds on these ideas... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} GEMS: Monitoring Failures \slideitem{Normal monitoring: - typical before error is spotted; - preprogrammed behaviours plus; - attentional checks on progress. \slideitem{Attentional checks: - are actions according to plan? - will plan still achieve outcome? \slideitem{Failure in these checks: - often leads to a slip or lapse. \slideitem{Reason also identifies: - Overattention failures. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} GEMS: Problem Solving Failures \slideitem{Humans are pattern matchers: - prefer to use (even wrong) rules; - before effort of knowledge level. \slideitem{Local state information: - indexes stored problem handling; - schemata, frames, scripts etc. \slideitem{Misapplication of good rules: - incorrect situation assessment; - over-generalisation of rules. \slideitem{Application of bad rules: - encoding deficiencies; - action deficiencies. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} GEMS: Knowledge-Based Failures \slideitem{Thematic vagabonding: - superficial analysis/behaviour; - flit from issue to issue. \slideitem{Encysting: - myopic attention to small details; - meta-level issues may be ignored. \slideitem{Reason: - individual fails to recognise failure; - does not face up to consequences. \slideitem{Berndt Brehmer & Dietrich Doerner. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} GEMS: Failure Modes and the SKR Levels Acknowledgement:J. Reason, Human Error, Cambridge University Press, 1990 (ISBN-0-521-31419-4). \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} GEMS: Error Detection \slideitem{Dont try to eliminate errors: - but focus on their detection. \slideitem{Self-monitoring: - correction of postural deviations; - correction of motor responses; - detection of speech errors; - detection of action slips; - detection of problem solving error. \slideitem{How do we support these activities? - standard checks procedures? - error hypotheses or suspicion? - use simulation based training? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} GEMS: Error Detection \slideitem{Dont try to eliminate errors: - but focus on their detection. \slideitem{Environmental error cueing: - block users progress; - help people discover error; - "gag" or prevent input; - allow input but warn them; - ignore erroneous input; - self correct; - force user to explain.. \slideitem{Importance of other operators \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} GEMS: Error Detection \slideitem{Cognitive barriers to error detection. \slideitem{Relevance bias: - users cannot consider all evidence; - "confirmation bias". \slideitem{ Partial explanations: - users accept differences between - "theory about state" and evidence. \slideitem{Overlaps: - even incorrect views will receive - some confirmation from evidence. \slideitem{"Disguise by familliarity". \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} GEMS: Practical Application \slideitem{So how do we use GEMS? \slideitem{Try to design to avoid all error? \slideitem{Use it to guide employee selection? \slideitem{Or only use it post hoc: - to explain incidents and accidents? \slideitem{No silver bullet, no panacea. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} GEMS: Practical Application \slideitem{Eliminate error affoardances: - increase visibility of task; - show users constraints on action. \slideitem{Decision support systems: - dont just present events; - provide trend information; - "what if" subjunctive displays; - prostheses/mental crutches? \slideitem{Memory aids for maintenance: - often overlooked; - aviation task cards; - must maintain maintenance data! \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} GEMS: Practical Application \slideitem{Improve training: - procedures or heuristics? - simulator training (contentious). \slideitem{Error management: - avoid high-risk strategies; - high probability/cost of failure. \slideitem{Ecological interface design: - Rasmussen and Vincente; - 10 guidelines (learning issues). \slideitem{Self-awareness: - when might I make an error? - contentious... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} GEMS: Practical Application \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} GEMS: Outstanding Issues \slideitem{Problem of intention: - is an error a slip or lapse? - is an error a mistake of intention? \slideitem{Given an observations of error: - aftermath of accident/incident; - guilt, insecurity, fear, anger. \slideitem{Can we expect valid answers? \slideitem{Can we make valid inferences? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} GEMS: Outstanding Issues \slideitem{ GEMS focusses on causation: - built on Rasmussens SKR model; - therefore, has explanatory power \slideitem{Hollnagel criticises it: - difficult to apply in the field; - do observations map to causes? \slideitem{Glasgow work has analysed: - GEMS plus active/latent failures; \slideitem{Results equivocal, GEMS: - provides excellent vocabulary; - can be hard to perform mapping. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} GEMS: Outstanding Issues (Risk Homeostasis Theory) \slideitem{What happens if we introduce the - decision aids Reason suggests? "Each road user has a target (or accepted) level of risk which acts as a comparison with actual risk. Where a difference exists, one may move towards the other. Thus, when a safety improvement occurs, the target level of risk motivates behaviour to compensate - e.g., drive faster or with less attention. Risk homeostasis theory (RHT) has not beenconcerned with the cognitive or behavioural pathways by which homeostasis occurs, only with the consequences of adjustments in terms of accident loss." Acknowledgement: T.W. Hoyes and A.I. Glendon, Risk Homeostasis: Issues for Further research, Safety Science, 16:19-33, (1993). \slideitem{Will users accept more safety? - or trade safety for performance? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} GEMS: Outstanding Issues (Risk Homeostasis Theory) \slideitem{Very contentions. \slideitem{Bi-directionality? - what if safety levels fall? - will users be more cautious? \slideitem{Does it affect all tasks? \slideitem{Does it affect work/leisure? \slideitem{How do we prove/disprove it? - unlikely to find it in simulators. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Conclusions: Individual Human Error \slideitem{Slips, Lapses and Mistakes. \slideitem{Rasmussen: Skill, Rules, Knowledge. \slideitem{Reason: Generic Error Modelling. \slideitem{Risk Homeostasis. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work \slideitem{Workload. \slideitem{Situation Awareness. \slideitem{Crew Resource Management \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Workload \slideitem{High workload: - stretches users resources. \slideitem{Low workload: - wastes users resources; - can inhibit ability to respond. \slideitem{Cannot be "seen" directly; - is inferred from behaviour. \slideitem{No widely accepted definition? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Workload "Physical workload is a straightforward concept. It is easy to measure and define in terms of energy expenditure. Traditional human factors texts tell us how to measure human physical work in terms of kilocalories and oxygen consumption..." Acknowledgement: B.H. Kantowitz and P.A. Casper, Human Workload in aviation. In E.L. Wiener and D.C. Nagel (eds.), Human Factors in Aviation, 157-187, Academic Press, London, 1988. "The experience of workload is based on the amount of effort, both physical and psycholoigcal, expended in response to system demands (taskload) and also in accordance with the operator's internal standard of performance." Acknowledgement: E.S. Stein and B. Rosenberg, The Measurement of Pilot Workload, Federal Aviation Authority, Report DOT/FAA/CT82-23, NTIS No. ADA124582, Atlantic City, 1983. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Workload - Wickens on perceptual channels; - Kantowitz on problem solving; - Hart on overall experience. \slideitem{Holistic vs atomistic approaches: - FAA (+ Seven) a gestalt concept; - cannot measure in isolation; - (many) experimentalists disagree. \slideitem{Single-user vs team approaches: - workload is dynamic; - shared/distributed between a team; - many prvious studies ignore this. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Workload \slideitem{ How do we measure workload? \slideitem{ Subjective ratings? - NASA TLX, task load index; - consider individual differences. \slideitem{ Secondary tasks? - performance on additional task; - obtrusive & difficult to generalise. \slideitem{ Physiological measures? - heart rate, skin temperature etc; - lots of data but hard to interpret. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Workload \slideitem{ How to reduce workload? \slideitem{ Function allocation? - static or dynamic allocation; - to crew, systems or others (ATC?). \slideitem{ Automation? - but it can increase workload;
- or change nature (monitoring). \slideitem{ Crew resource management? - coordination, decision making etc; - see later in this section... \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Workload and Situation Awareness Acknowledgement: C.D. Wickens and J.M. Flach, Information Processing. In E.L. Wiener and D.C. Nagel (eds.), Human Factors in Aviation, 111-156, Academic Press, London, 1988. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Situation Awareness "Situation awareness is the perception of the elements of the environment within a volume of time and spcae, the comprehension of their meaning, and the projection of their status in the near future" Acknowledgement: M. R. Endsley, Design and Evaluation for Situation Awareness Enhancement. In Proceedings of the Human Factors Society 32nd Annual Meeting, 97-101. Human Factors Society, Santa Monica, CA, 1988. \slideitem{Rather abstract definition. \slideitem{Most obvious when it is lost. \slideitem{Difficult to explain behaviour: - beware SA becoming a "catch all"; - just as "high workload" was. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Situation Awareness Hint: use your browser to open this image Acknowledgement: M.R. Endsley, Towards a Theory of Situation Awareness, Human Factors, (37)1:32-64, 1995. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Situation Awareness \slideitem{Level 1: perception of environment - how much can be attended to? - clearly not everything... \slideitem{Level 2: Comprehension of situation - synthesise the elements at level 1; - significance determined by goals. \slideitem{Level 3: Projection of future. - knowledge of status and dynamics; - may only be possible in short term; - enables strategy not just reaction. \slideitem{Novice perceives everything at L1; - but fails at levels 2 and 3. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Situation Awareness Acknowledgement: D.G. Jones and M.R. Endsley, Sources of Situation Awareness Errors in Aviation. Aviation, Space and Evironmental Medecine, 67(6):507-512, 1996. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Situation Awareness \slideitem{Hmm, subjective classification. \slideitem{33 incidents with Air Traffic Control. \slideitem{NASA (ASRS) reporting system: - how typical are reported events? \slideitem{I worry about group work: - colleagues help you maintain SA? - prompting, reminding, informing? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Situation Awareness "Investigators were able to trace a series of errors that initiated with the flight crews acceptance of the controller's offer to land on runway 19. The flightcrew expressed concern about possible delays and accepted an offer to expedite their approach into Cali... One of the AA965 pilots selected a direct course to the Romeo NDB believing it was the Rozo NDB, and upon executing the selection in the FMS permitted a turn of the airplane towards Romeo, without having verified that it was the correct selection and without having first obtained approval of the other pilot, contrary to AA procedures... The flightcrew had insufficient time to prepare for the approach to Runway 19." American Airlines Flight 965 Boeing 757-223, N651AA Near C\slideitem{ Colombia December 20, 1995 Aeronautica Civil of the Republic of Colombia \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Crew Resource Management "...Among the results were that captains of more effective crews (who made fewer operational or precedural errors) verbalised a greater number of plans than those of lower performing crews and requested and used more information in making their decisions. This raises interesting questions about whether situation awareness can be improved by teaching specific communication skills or even proceduralising certain communications that would otherwise remain in the realm of unregulated CRM (crew resource management behaviour)." Acknowledgement: S. Dekker and J. Orasanu, Automation and Situation Awareness. In S. Dekker and E. Hollnagel (eds.), Coping with Computers in the Cockpit. 69-85, Ashgate, Aldershot, 1999. ISBN-0-7546-1147-7. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Crew Resource Management \slideitem{Cockpit Resource Management: \slideitem{Cockpit Resource Management: - crew coordination; - decision making; - situation awareness... \slideitem{More review activities inserted into standard operating procedures. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Crew Resource Management Acknowledgement: C.A. Bowers, E.L. Blickensderfer and B.B. Morgan, Air Traffic Control Team Coordination. In M.W. Smolensky and E.S. Stein, Human Factors in Air Traffic Control, 215-237, Academic Press, London, 1998. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Crew Resource Management \slideitem{Cockpit Resource Management: - based on Foushee and Helmreich. \slideitem{Group performance determined by: - process variables - communication; - input variables - group size/skill. \slideitem{Goes against image of: - pilot as "rugged individual"; - showing "the right stuff". \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Crew Resource Management \slideitem{Key objectives... \slideitem{ alter individual attitudes to groups; \slideitem{ improve coordination within crew; \slideitem{ increase team member effort; \slideitem{ optimise team composition. \slideitem{Can we change group norms? \slideitem{Does it apply beyond aviation? - with fewer rugged individuals? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Crew Resource Management FAA Advisory Circular 120-51A 1993 \slideitem{ Briefings are interactive and emphasize the importance of questions, critique, and the offering of information. \slideitem{ Crew members speak up and state their information with appropriate persistence until there is some clear resolution. \slideitem{ Critique is accepted objectively and non-defensively. \slideitem{ The effects of stress and fatigue on performance are recognised. NASA /UT LOS Checklist \slideitem{When conflicts arise, the crew remain focused on the problem or situation at hand. Crew members listen actively to ideas and opinions and admit mistakes when wrong, conflict issues are identified and resolved. \slideitem{ Crew members verbalize and acknowledge entries to automated systems parameters. \slideitem{ Cabin crew are included as part of team in briefings, as appropriate, and guidelines are established for coordination between flight deck and cabin. Human Factors Group Of The Royal Aeronautical Society. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Crew Resource Management CRM TRAINING METHODS AND PROCESSES Phase One - Awareness training - 2 days classroom (residential or non-residential). Objectives: \slideitem{Knowledge: \slideitem{ Relevance of CRM to flight safety and the efficient operation of an aircraft \slideitem{ How CRM reduces stress and improves working environment \slideitem{ Human information processing \slideitem{ Theory of human error \slideitem{ Physiological effects of stress and fatigue \slideitem{ Visual & aural limitations \slideitem{ Motivation \slideitem{ Cultural differences \slideitem{ CRM language and jargon. \slideitem{ The CRM development process \slideitem{ Roles such as leadership and followership \slideitem{ Systems approach to safety and man machine interface and SHEL model \slideitem{ Self awareness \slideitem{ Personality types \slideitem{ Evaluation of CRM \slideitem{Skills: \slideitem{Nil \slideitem{Attitudes: \slideitem{ Motivated to observe situations, others' and own behaviour in future. \slideitem{ Belief in the value of developing CRM skills. \slideitem{ Activities: \slideitem{ Presentations \slideitem{ Analysis of incidents and accidents by case study or video \slideitem{ Discussion groups \slideitem{ Self disclosure \slideitem{ Personality profiling and processing \slideitem{ Physiological experience exercises \slideitem{ Self study Human Factors Group Of The Royal Aeronautical Society. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Crew Resource Management CRM TRAINING METHODS AND PROCESSES Phase Two - Basic Skills training - 3/4 days classroom residential Objectives: \slideitem{ Knowledge: \slideitem{ Perceptions \slideitem{ How teams develop \slideitem{ Problem solving & decision making processes \slideitem{ Behaviours and their differences \slideitem{ Thought processes \slideitem{ Respect and individual rights \slideitem{ Development of attitudes \slideitem{ Communications toolkits \slideitem{ Skills: \slideitem{ See Appendix B \slideitem{ Attitudes \slideitem{ See Appendix B \slideitem{ Activities: \slideitem{ Presentations \slideitem{ Experiential learning - (Recreating situations and experiences, using feelings to log in learning, experimenting in safe environments with cause and effect behaviour exercises) \slideitem{ Role play \slideitem{ Videod exercises \slideitem{ Team exercises \slideitem{ Giving & receiving positive and negative criticism \slideitem{ Counselling \slideitem{ Case studies \slideitem{ Discussion groups \slideitem{ Social and leisure activities Human Factors Group Of The Royal Aeronautical Society. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Crew Resource Management CRM TRAINING METHODS AND PROCESSES Classroom, CPT or simulator Objectives: \slideitem{ Development of knowledge, skills and attitudes to required competency standards. \slideitem{ Activities: Practicing one or more skills on a regular basis under instruction in either the classroom, mock up/ CPT facility or full simulator LOFT sessions. Also considered valuable would be coaching by experienced crews during actual flying operations. Human Factors Group Of The Royal Aeronautical Society. \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work: Crew Resource Management "Under normal conditions, aircraft flying is not a very interdependent task. In many cases, pilots are able to fly their aircraft successfully with relatively little coordination with other crew members, and communication between crew members is rquired primarily during nonroutine situations." Acknowledgement: C.A. Bowers, E.L. Blickensderfer and B.B. Morgan, Air Traffic Control Team Coordination. In M.W. Smolensky and E.S. Stein, Human Factors in Air Traffic Control, 215-237, Academic Press, London, 1998. \slideitem{Does it work in abnormal events? \slideitem{Additional requirements ignored? \slideitem{Can it hinder performance? \slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \slidestart{Terminology} Human Error and Group Work \slideitem{Workload. \slideitem{Situation Awareness. \slideitem{Crew Resource Management \slideend{\it{\copyright C.W. Johnson, 1997 - Human Computer Interaction}.} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \end{document}