\input{/users/staff/johnson/teaching/hoskyns/slidedefs.tex}
\title{Safety Critical Systems Development}
\author{Prof. Chris Johnson,\\
Department of Computing Science,\\
University of Glasgow,\\
Glasgow,\\
Scotland.\\
G12 8QJ.\\ \\
URL: http://www.dcs.gla.ac.uk/$\sim$johnson\\
E-mail: johnson@dcs.glasgow.ac.uk\\
Telephone: +41 330 6053}
\date{October 1999.}
\begin{document}
\maketitle
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Developmen
t}.}
\pagehead{Terminology and the Arian 5 Case Study}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Introduction}
Safety Critical Systems Development
Hazard Analysis
\slideitem{ Hazard Analysis.
\slideitem{ FMECA/FMEA.
\slideitem{ Case Study.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Hazard Analysis
\slideitem{ Safety case:
- why proposed system is safe.
\slideitem{ Must identify potential hazards.
\slideitem{ Assess liklihood and severity.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Hazard Analysis
\slideitem{Lots of variant features:
- checklists...
- hazard indices...
\slideitem{ Lots of techniques:
- fault tress (see later);
- cause consequence analysis;
- HAZOPS;
- FMECA/FHA/FMEA...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA - Failure Modes, Effect and Criticality Analysis
\slideitem{MIL STD 1629A (1977!).
\slideitem{Analyse each potential failure.
\slideitem{Determine impact of system(s).
\slideitem{Assess its criticality.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA - Failure Modes, Effect and Criticality Analysis
1. Construct functional block diagram.
2. Use diagram to identify any associated failure modes.
3. Identify effects of failure and assess criticality.
4. Repeat 2 and 3 for potential consequences.
5. Identify causes and occurence rates.
6. Determine detection factors.
7. Calculate Risk Priority Numbers.
8. Finalise hazard assessment.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA - Step 1: Functional Block Diagram
\slideitem{Establish scope of the analysis.
\slideitem{Break system into subcomponents.
\slideitem{Different levels of detail?
\slideitem{Some unknowns early in design?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA - Step 1: Functional Block Diagram
Acknowledgement: taken from J.D. Andrews and T.R. Moss, Reliability and Risk
Assessment, Longman, Harlow, 1993
(ISBN-0-582-09615-4).
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA - Step 2: Identfy Failure Modes
\slideitem{ Many different failure modes:
- complete failure;
- partial failure;
- intermittant failure;
- gradual failure;
- etc.
\slideitem{Not all will apply?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA - Step 3: Assess Criticality
| Hazardous without warning | Very high severity ranking when a potential failure mode affects safe operation or involves non-compliance with a government regulation without warning. | 10 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Hazardous with warning | Failure affects safe product operation or involves noncompliance with government regulation with warning. | 9 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Very High | Product is inoperable with loss of primary Function. | 8 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| High | Product is operable, but at reduced level of performance. | 7 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Moderate | Product is operable, but comfort or convenience item(s) are inoperable. | 6 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Low | Product is operable, but comfort or convenience item(s) operate at a reduced level of performance. | 5 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Very Low | Fit & finish or squeak & rattle item does not conform. Most customers notice defect. | 4 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Minor | Fit & finish or squeak & rattle item does not conform. Average customers notice defect. | 3 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Very Minor | Fit & finish or squeak & rattle item does not conform. Discriminating customers notice defect. | 2 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| None | No effect | 1 |
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA - Step 4: Repeat for potential consequences
\slideitem{ Can have knock-on effects.
\slideitem{Additional failure modes.
\slideitem{Or additional contexts of failure.
\slideitem{Iterate on the analysis.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA - Step 5: Identify Cause and Occurence Rates
\slideitem{Modes with most severe effects first.
\slideitem{What causes the failure mode?
\slideitem{How likely is that cause?
\slideitem{risk = frequency x cost
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA - Step 5: Identify Cause and Occurence Rates
| 1 in 2 | 10 |
||||
| 1 in 3 | 9 |
||||
| High: Repeated failures | 1 in 8 | 8 |
|||
| 1 in 20 | 7 |
||||
| Moderate: Occasional failures | 1 in 80 | 6 |
|||
| 1 in 400 | 5 |
||||
| 1 in 2000 | 4 |
||||
| Low: Relatively few failures | 1 in 15,000 | 3 |
|||
| 1 in 150,000 | 2 |
||||
| Remote: Failure is unlikely | 1 in 1,500,000 | 1 |
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA - Step 6: Determine detection factors
Type (1):
These controls prevent the Cause or Failure Mode from
occurring, or reduce their rate of occurrence.
Type (2):
These controls detect the Cause of the Failure Mode and lead
to corrective action.
Type (3):
These Controls detect the Failure Mode before the product
operation, subsequent operations, or the end user.
\slideitem{Can we detect/control failure mode?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA - Step 6: Determine detection factors
Detection | Criteria: Likelihood of Detection by Design Control | Rank |
| Absolute Uncertainty | Design Control does not detect a potential Cause of failure or subsequent Failure Mode; or there is no Design Control | 10 |
|||
| Very Remote | Very remote chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode | 9 |
|||
| Remote | Remote chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode | 8 |
|||
| Very Low | Very low chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode | 7 |
|||
| Low | Low chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode | 6 |
|||
| Moderate | Moderate chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode | 5 |
|||
| Moderately High | Moderately high chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode | 4 |
|||
| High | High chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode | 3 |
|||
| Very High | Very high chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode | 2 |
|||
| Almost Certain | Design Control will almost certainly detect a potential Cause of failure or subsequent Failure Mode | 1 |
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA: Tools
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA: Tools
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Conclusions
\slideitem{ Hazard analysi.
\slideitem{ FMECA/FMEA.
\slideitem{ Qualitative->quantitative approaches.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Probabilistic Risk Assessment (PRA)
The use of PRA technology should be increased in all regulatory
matters to the extent supported by the state of the art in PRA
methods and data and in a manner that complements the NRC's
deterministic approach and supports the NRC's traditional
defense-in-depth philosophy.
PRA and associated analyses (e.g., sensitivity studies, uncertainty
analyses, and importance measures) should be used in
regulatory matters, where practical within the bounds of the state
of the art, to reduce unnecessary conservatism associated
with current regulatory requirements, regulatory guides, license
commitments, and staff practices.
An Approach for Plant-Apecific, Risk-Informed Decisionmaking: Technical Specifications
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Hazard Analysis vs PRA
\slideitem{ FMECA - hazard analysis.
\slideitem{ PRA part of hazard analysis.
\slideitem{ Wider links to decision theory.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Decision Theory
\slideitem{ Risk = frequency x cost.
\slideitem{Which risk do we guard against?
DecisionA = (option_1; option_2;...; option_n)
DecisionB = (option_1; option_2;...;option_m)
Val(Decision) =
sum^{i = 1 to limit} utility(option_n) x freq(option_n)
\slideitem{Are decision makers rational?
\slideitem{Can you trust the numbers?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA - Meta-Issues
\slideitem{ Decision theory counter intuitive?
\slideitem{But just a formalisation of FMECA?
\slideitem{What is the scope of this approach?
- hardware failure rates (here)?
- human error rates (here)?
- software failure rates?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA
Acknowledgement: J. D. Andrews and T.R. Moss, Reliability and Risk Assessment, Longman, New York, 1993.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA
\slideitem{ Failure rate assumed to be constant.
\slideitem{Electronic systems approximate this.
\slideitem{Mechanical systems:
- bed-down failure rates;
- degrade failure rates;
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA - Mean Time To Failure
\slideitem{ MTTF:
reciprocal of constant failure rate.
MTTF = 1 / lambda.
lambda - base failure rate
\slideitem{ 0.2 failures per hour:
\slideitem{See Andrews and Moss for proof.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA - Or Put Another Way...
Probability that product will work for T without failure:
R(T) = exp(-T/MTTF)
\slideitem{ If MTTF = 250,000 hours.
\slideitem{ Over life of 10 years (87,600 hours).
\slideitem{ R = exp(-87,600/250000) = 0.70441
\slideitem{ 70.4% prob of no failure in 10 years.
\slideitem{ 70.4% of systems working in 10 year.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA
\slideitem{ For each failure mode.
Criticality_m = a x b x lambda_p x time
lambda_p - base failure rate with environmental/stress data
a - proportion of total failures in specified failure mode m
b - conditional prob. that expected failure effect will result
\slideitem{ If no failure data use:
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA - Sources of Data
\slideitem{ MIL-HDBK-217:
Reliability Prediction of Electronic Equipment
\slideitem{Failure rate models for:
- ICs, transistors, diodes, resistors,
- relays, switches, connectors etc.
\slideitem{ Field data + simplifying assumptions.
\slideitem{Latest version F being revised.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA
\slideitem{ 217 too pessimistic for companies...
\slideitem{Bellcore (Telcordia):
- reliability prediction procedure..
During 1997, AT&T's Defects-Per-Million performance was
173, which means that of every one million calls placed on the
to a network failure. That equals a network reliability rate of
99.98 percent for 1997.
\slideitem{ Business critical not safety critical
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA
\slideitem{But MTTF doesnt consider repair!
\slideitem{MTTR considers observations.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA and FMECA Mode Probability
\slideitem{FMECA:
- we used subjective criticality;
- however, MIL-338B calculates it;
- no. of failures per hour per mode.
\slideitem{CR = alpha x beta x lamda:
CR - criticality level,
alpha - failure mode frequency ratio,
beta - loss prob. of item from mode
lambda - base failure rate for item.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA and FMECA Mode Probability
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA
\slideitem{We focussed on hardware devices.
\slideitem{PRA for human reliability?
\slideitem{Probably not a good idea.
\slideitem{But for completeness...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Technique for Human Error Rate Prediction (THERP)
``The THERP approach uses conventional reliability technology modified to account for greater variability and independence of human performance as compared with that of equipment performance... The procedures of THERP are similar to those employed in conventional reliability analysis, except that human task activities are substituted for equipment outputs.''
(Miller and Swain, 1987 - cited by Hollnagel, 1998).
A.D. Swain and H.E. Guttman,
Handbook of Human Reliability with Emphasis on Nuclear Power Plant Applications
NUREG-CR-1278, 1985.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Technique for Human Error Rate Prediction (THERP)
\slideitem{Pe =
He * Sum^{k=1 to n} Psf_k * W_k + C
\slideitem{ Where:
Pe - probability of error;
He - raw human error probability;
C - numerical constant;
Psf_k - performance shaping factor;
W_k - weight associated with PSF_k;
n - total number of PSFs.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Technique for Human Error Rate Prediction (THERP)
\slideitem{"Psychological vaccuous" (Hollnagel).
\slideitem{No model of cognition etc.
\slideitem{Calculate effect of PSF on HEP
- ignores WHY they affect performance.
\slideitem{Succeeds or fails on PSFs.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
THERP - External PSFs
Acknowledgement: A.D. Swain, Comparative Evaluation of Methods for Human Reliability Analysis, (GRS-71), Garching FRG: Gesellschaft fur Reaktorsicherheit.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
THERP - Stressor PSFs
Acknowledgement: A.D. Swain, Comparative Evaluation of Methods for Human Reliability Analysis, (GRS-71), Garching FRG: Gesellschaft fur Reaktorsicherheit.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
THERP - Internal PSFs
Hint: use yor browser to open this image.
Acknowledgement: A.D. Swain, Comparative Evaluation of Methods for Human Reliability Analysis, (GRS-71), Garching FRG: Gesellschaft fur Reaktorsicherheit.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
CREAM
E. Hollnagel,
Cognitive Reliability and Error Analysis Method,
Elsevier, Holland, 1998.
\slideitem{HRA + theoretical basis.
\slideitem{Simple model of control:
- scrambled - unpredictable actions;
- opportunistic - react dont plan;
- tactical - procedures and rules;
- strategic - consider full context.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
CREAM - Simple Model of Control
Hint: use yor browser to open this image.
Acknowledgement:
E. Hollnagel,
Cognitive Reliability and Error Analysis Method,
Elsevier, Holland, 1998.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
CREAM - Simple Model of Control
Acknowledgement:
E. Hollnagel,
Cognitive Reliability and Error Analysis Method,
Elsevier, Holland, 1998.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
CREAM
\slideitem{Much more to the technique...
\slideitem{But in the end:
Strategic = 0.000005 < p < 0.01
Tactic = 0.001< p < 0.1
Opportunistic = 0.01 < p < 0.5
Scrambled = 0.1 < p < 1.0
\slideitem{Common performance conditions to
- probable control mode then to
- reliability estimate from literature.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Conclusions
\slideitem{PRA for hardware:
- widely accepted with good data;
\slideitem{PRA for human performance:
- many are skeptical;
- THERP -> CREAM ->
\slideitem{PRA for software?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA and Fault Tree Analysis
\slideitem{Fault Trees (recap)
\slideitem{Software Fault Trees.
\slideitem{Software PRA.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Trees (Recap)
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis
\slideitem{Each tree considers 1 failure.
\slideitem{Carefully choose top event.
\slideitem{Carefully choose system boundaries.
\slideitem{Assign probabilities to basic events.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis
\slideitem{Assign probabilities to basic events.
\slideitem{Stop if you have the data.
\slideitem{Circles denote basic events.
\slideitem{Even so, tool support is critical.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis
\slideitem{Usually applied to hardware...
\slideitem{Can be used for software (later).
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis
\slideitem{House events; "switch" true or false.
\slideitem{OR gates - multiple fault paths.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis
\slideitem{Probabilistic inhibit gates.
\slideitem{Used with Monte Carlo techniques
- True if random number < prob.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis
\slideitem{Usually applied to hardware...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis
Acknowledgement: J.D. Andrews and T.R. Moss, Reliability and Risk Assessment, Longman Scientific and Technical, Harlow, 1993.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis - Cut Sets
\slideitem{Each failure has several modes
- `different routes to top event'.
\slideitem{Cut set:
basic events that lead to top event.
\slideitem{Minimal cut set:
removing a basic event avoids failure.
\slideitem{Path set:
basic events that avoid top event;
list of components that ensure safety.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis - Cut Sets
\slideitem{Top_Event = K1 + K2 + ... K_n
K_i minimal cut sets, + is logical OR.
\slideitem{K_i = X_1 . X_2 . X_n
MCS are conjuncts of basic events.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis - Cut Sets
\slideitem{Top-down approach:
- replace event by expression below;
- simply if possible (C.C = C).
\slideitem{ Can use Karnaugh map techniques;
- cf logic circuit design;
- recruit tool support in practice.
\slideitem{Notice there is no negation.
\slideitem{Notice there is no XOR.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis - MOCUS Cut Set Algorithm
1. Assign unique label to each gate.
2. Label each basic event.
3. Create a two dimensional array A.
4. Initialise A(1,1) to top event.
5. Scan array to find an OR/AND gate:
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis - Probabilistic Analysis
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis - Probabilistic Analysis
\slideitem{Beware: independence assumption.
"If the same event occurs multiple times/places in a tree, any quantitative calculation must correctly reduce the boolean equation to account for these multiple occurrences.
Independence merely means that the event is not caused due to the failure of another event or component, which then moves into the realm of conditional probabilities."
\slideitem{Inclusion-exclusion expansion (Andrews & Moss).
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Fault Trees
\slideitem{As you'd expect.
\slideitem{Starts with top-level failure
\slideitem{Trace events leading to failure.
\slideitem{But:
dont use probabilistic assessments;
\slideitem{If you find software fault path REMOVE IT!
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Fault Trees
Leveson, N.G., Cha, S.S., Shimeall, T.J. ``Safety Verification of Ada Programs using Software Fault Trees,'' IEEE Software, July 1991.
\slideitem{Backwards reasoning.
\slideitem{Weakest pre-condition approach.
\slideitem{Similar to theorem proving.
\slideitem{Uses language dependent templates.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Fault Trees
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Fault Trees
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Fault Trees
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Fault Trees
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Fault Trees
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Fault Trees
\slideitem{Exception template for Ada83.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Fault Trees
See: S.-Y. Min, Y-K. Jang, A-D Cha, Y-R Kwon and D.-H. Bae, Safety Verification of Ada95 Programs Using Software Fault Trees. In M. Felici, K. Kanoun and A. Pasquini (eds.) Computer Safety, Reliability and Security, Springer Verlag, LNCS 1698, 1999.
\slideitem{Exception template for Ada95.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA for Software
\slideitem{John Musa's work at Bell Labs.
\slideitem{Failure rate of software before tests.
\slideitem{Faults per unot of time (lambda_0):
- function of faults over infinite time.
\slideitem{ Based on execution time:
- not calendar time as in hardware;
- so no overall system predictions.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Musa's PRA for Software
lambda_0 = K x P x W_0
\slideitem{Remember - `Black Box' architecture.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Leveson's Completeness Criteria
\slideitem{Human Computer Interface Criteria.
\slideitem{State Completeness.
\slideitem{Input/Output Variable Completeness.
\slideitem{Trigger Event Completeness.
\slideitem{Output Specification Completeness.
\slideitem{Output to Trigger Relationships.
\slideitem{State Transitions.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Leveson's Completeness Criteria
\slideitem{Human Computer Interface Criteria.
\slideitem{Criteria depend on task context.
\slideitem{Eg in monitoring situation:
- what must be observed/displayed?
- how often is it sampled/updated?
- what is message priority?
\slideitem{Not just when to present but also
- when to remove information...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Leveson's Completeness Criteria
\slideitem{State Completeness Criteria.
\slideitem{Consider input effect when state is:
- normal, abnormal, indeterminate.
\slideitem{Start-up, close-down are concerns.
\slideitem{Process will change even during
- intervals in which software is `idle'.
\slideitem{Checkpoints, timeouts etc.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Leveson's Completeness Criteria
\slideitem{Input/Output Variable Completeness.
\slideitem{Input from sensors to software.
\slideitem{Output from software to actuators.
\slideitem{Specification may be incomplete if:
- sensor isnt refered to in spec;
- legal value isnt used in spec.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Leveson's Completeness Criteria
\slideitem{Trigger Event Completeness.
Robustness:
every state has a transition defined for every possible input.
Non-determinism:
only 1 transition is possible from a state for each input.
Value and Timing assumptions:
- what triggers can be produced from the environment?
- what ranges must trigger variables fall within?
- what are the real-time requirements...
- specify bounds for responses to input (timeouts)
\slideitem{And much, much more....
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Leveson's Completeness Criteria
\slideitem{Output Specification Completeness.
- from software to process actuators.
\slideitem{Check for hazardous values.
\slideitem{Check for hazardous timings;
- how fast do actuators take events?
- what if this rate is exceeded?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Leveson's Completeness Criteria
\slideitem{Output to Trigger Relationships.
\slideitem{Links between input & output events.
\slideitem{For any output to actuators:
- can effect on process be detected?
- if output fails can this be seen?
\slideitem{ What if response is:
- missing, too early or too late?
\slideitem{If response recieved without trigger
- then erroneous state.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Leveson's Completeness Criteria
\slideitem{State Transitions.
Reachability:
all specified states can be reached from initial state.
Recurrent behaviour:
desired recurrent behaviour must execute for at least one cycle and be bounded by exit condition.
Reversibility:
output commands should wherever possible be reversible and those which are not must be carefully controlled.
\slideitem{Completeness criteria change.
\slideitem{Environment and functions change.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
From Requirements to Design
Once the requirements have been detailed and accepted, the design will
process of allocating and arranging the functions of the system so that
the aggregate meets all
customer needs. Since several different designs may meet the
requirements, alternatives must be
assessed based on technical risks, costs, schedule, and other
considerations. A design developed
before there is a clear and concise analysis of the systems objectives
can result in a product that
does not satisfy the requirements of its customers and users. In
addition, an inferior design can
make it very difficult for those who must later code, test, or maintain
the software. During the
course of a software development effort, analysts may offer and explore
many possible design
alternatives before choosing the best design.
US Department of Defence: Electronic Reliability Design Handbook
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Preliminary Design
Preliminary or high-level design is the phase of a software project in
which the major software
system alternatives, functions, and requirements are analyzed. From the
alternatives, the
software system architecture is chosen and all primary functions of the
system are allocated to the
computer hardware, to the software, or to the portions of the system
that will continue to be
accomplished manually.
US Department of Defence: Electronic Reliability Design Handbook
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Preliminary Design
\slideitem{Develop the architecture:
\slideitem{ system architecture - an overall view of system components
\slideitem{ hardware architecture - the systems hardware components and their
interrelations
\slideitem{ software architecture - the systems software components and their
interrelations
\slideitem{ Investigate and analyze the physical alternatives for the system and
choose solutions
\slideitem{ Define the external characteristics of the system
\slideitem{ Refine the internal structure of the system by decomposing the
high-level software
architecture
\slideitem{ Develop a logical view or model of the systems data
US Department of Defence: Electronic Reliability Design Handbook
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Detailed Design
Detailed design or low-level design determines the specific steps
required for each component or
process of a software system. Responsibility for detailed design may
belong to either the system
designers (as a continuation of preliminary design activities) or to the
system programmers.
Information needed to begin detailed design includes: the software
system requirements, the
system models, the data models, and previously determined functional
decompositions. The
specific design details developed during the detailed design period are
categories: for the system as a whole (system specifics), for individual
processes within the
system (process specifics), and for the data within the system (data
specifics).
US Department of Defence: Electronic Reliability Design Handbook
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Detailed Design (Example concerns)
System specifics:
\slideitem{ Physical file system structure
\slideitem{ Interconnection records or protocols between software and hardware
components
\slideitem{ Packaging of units as functions, modules or subroutines
\slideitem{ Interconnections among software functions and processes
\slideitem{ Control processing
\slideitem{ Memory addressing and allocation
\slideitem{ Structure of compilation units and load modules
Process specifics:
\slideitem{ Required algorithmic details
\slideitem{ Procedural process logic
\slideitem{ Function and subroutine calls
\slideitem{ Error and exception handling logic
Data specifics:
\slideitem{ Global data handling and access
\slideitem{ Physical database structure
\slideitem{ Internal record layouts
\slideitem{ Data translation tables
\slideitem{ Data edit rules
\slideitem{ Data storage needs
US Department of Defence: Electronic Reliability Design Handbook
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Don't Forget the Impact of Standards
UK Defense software standard
Sean Matthews
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
DO-178B - NASA GCS Case Study
\slideitem{Project compared:
- faults found in statistical tests;
- faults found in 178B development.
\slideitem{Main conclusions:
- such comparisons very difficult;
- DO-178B hard to implement;
- lack of materials/examples.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Development: DO-178B Practitioners' View
The difficulties that have been identified are the DO-178 requirements for evidence and
rigorous verification...
Systematic records of accomplishing each of the
objectives and guidance are necessary. A documentation trail must exist
demonstrating that the development processes not only were carried out, but also
were corrected and updated as necessary during the program life cycle. Each
document, review, analysis, and test must have evidence of critique for accuracy and
completeness, with criteria that establishes consistency and expected results. This is
usually accomplished by a checklist which is archived as part of the program
certification records. The degree of this evidence varies only by the safety criticality of
the system and its software.
Original source on http://stsc.hill.af.mil/crosstalk/1998/oct/schad.asp
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Development: DO-178B Practitioners' View
...Engineering has not been schooled or trained to
meticulously keep proof of the processes, product, and
verification real-time. The engineers have focused on the
development of the product, not the delivery. In addition,
program durations can be from 10 to 15 years resulting in
the software engineers moving on by the time of system
delivery. This means that most management and engineers
have never been on a project from "cradle-to-grave."
Original source on http://stsc.hill.af.mil/crosstalk/1998/oct/schad.asp
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Development: DO-178B Practitioners' Views
The weakness of commercial practice with DO-178B is the lack of consistent,
comprehensive training of the FAA engineers/DERs/foreign agencies affecti
ng:
\slideitem{ the effectiveness of the individual(s) making findings; and,
\slideitem{ the consistency of the interpretations in the findings.
Training programs may be the answer for both the military and commercial
environments to avoid the problem of inconsistent interpretation and the results
of
literal interpretation.
Original source on http://stsc.hill.af.mil/crosstalk/1998/oct/schad.asp
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Safety-Critical Software Development - Conclusions
\slideitem{Software design by:
- hazard elimination;
- hazard reduction;
- hazard control.
\slideitem{Software implementation issues:
- dangerous practices;
- choice of `safe' languages.
\slideitem{The DO-178B Case Study.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Design and Hazard Reduction
\slideitem{Design for control:
- incremental control;
- intermediate states;
- decision aids;
- monitoring.
\slideitem{Add barriers:
- hard/software locks;
\slideitem{Minimise single point failures:
- increase safety margins;
- exploit redundancy;
- allow for recovery.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Hazard Reduction: Interlock Example
This heavy duty solenoid controlled tongue switch controls access to hazardous
machines with rundown times.
Olympus withstands the arduous environments
associated with the frequent operation of heavy
duty access guards. The unit also self adjusts to
tolerate a high degree of guard misalignment.
The stainless steel tongue actuator is
self-locking and can only be released after the
solenoid receives a signal from the machine
control circuit. This ensures that the machine has
completed it's cycle and come to rest before the
tongue can be disengaged and machine access
obtained.
Software Design and Hazard Control
\slideitem{Limit exposure.
back to `normal' fast (exceptions).
\slideitem{Isolate and contain.
dont let things get worse...
\slideitem{Fail-safe.
panic shut-downs, watchdog code.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Hazard Control: Watchdog Example
\slideitem{Hardware or software (beware).
\slideitem{Check for processor activity:
- 1. load value into a timer;
- 2. decrement timer every interval;
- 3. if value is zero then reboot.
\slideitem{Processor performs 1 at a frequency
- great enough to stop 3 being true;
- unless it has crashed.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Design Techniques: Fault Tolerance
\slideitem{Avoid common mode failures.
\slideitem{Need for design diversity.
\slideitem{Same requirements:
- different programmers?
- different contractors?
- homogenous parallel redundancy?
- microcomputer vs PLC solutions?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Design Techniques: Fault Tolerance
\slideitem{Redundant hardware may duplicate
- any faults if software is the same.
\slideitem{N-version programming:
- shared requirements;
- different implementations;
- voting ensures agreement.
\slideitem{ What about timing differences?
- comparison of "continuous" values?
- what if requirements wrong?
- costs make N>2 very uncommon;
- performance costs of voting.
\slideitem{A340 primary flight controls.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Design Techniques: Fault Tolerance
\slideitem{Exception handling mechanisms.
\slideitem{Use run-time system to detect faults:
- raise an exception;
- pass control to appropriate handler;
- could be on another processor.
\slideitem{Propagate to outmost scope then fail.
\slideitem{Ada...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Design Techniques: Fault Tolerance
\slideitem{Recovery blocks:
- write acceptance tests for modules;
- if it fails then execute alternative.
\slideitem{Must be able to restore the state:
- take a snapshot/checkpoint;
- if failure restore snapshot.
\slideitem{But:
- if failed module have side-effects?
- eg effects on equip under control?
- recovery block will be complicated.
\slideitem{Different from execptions:
- dont rely on run-time system.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Design Techniques: Fault Tolerance
\slideitem{Control redundancy includes:
- N-version programming;
- recovery blocks;
- exception handling.
\slideitem{But data redundancy uses extra data
- to check the validity of results.
\slideitem{Error correcting/detecting codes.
\slideitem{Checksum agreements etc.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Implementation Issues
\slideitem{Restrict language subsets.
\slideitem{Alsys CSMART Ada kernel etc.
\slideitem{Or just avoid high level languages?
\slideitem{No task scheduler - bare machine.
\slideitem{Less scheduling/protection risks
- more maintenance risks;
- less isolation (no modularity?).
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Implementation Issues
\slideitem{Memory jumps:
- control jumps to arbitrary location?
\slideitem{Overwrites:
- arbitrary address written to?
\slideitem{Semantics:
- established on target processor?
\slideitem{Precision:
- integer, floating point, operations...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Implementation Issues
\slideitem{Data typing issues:
Acknowledgement: W.J. Cullyer, S.J. Goodenough, B.A. Wichmann, The choice of a Computer Language for Use in Safety-Critical Systems, Software Engineering Journal, (6)2:51-58, 1991.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Implementation Issues: Language Wars
\slideitem{CORAL subset:
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Development: DO-178B Life Cycle
\slideitem{ Planning Process:
- coordinates development activities.
\slideitem{Software Development Processes:
- requirements process
- design process
- coding process
- integration process
\slideitem{ Software Integral Processes:
- verification process
- configuration management
- quality assurance
- certification liaison
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Development: DO-178B Requirements for Design Descriptions
(a) A detailed description of how the software satisfies the specified software high-level requirements, including algorithms, data-structures and how software requirements are allocated to processors and tasks.
(b) The description of the software architecture defining the software structure to implement the requirements.
(c) ???????????
(d) The data flow and control flow of the design.
(e) Resource limitations, the strategy for managing each resource and its limitations, the margins and the method for measuring those margins, for example timing and memory.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Development: DO-178B Requirements for Design Descriptions
(f) Scheduling procedures and interprocessor/intertask communication mechanisms, including time-rigid sequencing, pre-emptive scheduling, Ada rendez-vous and interrupts.
(g) Design methods and details for their implementation, for example, software data loading, user modifiable software, or multiple-version dissimilar software.
(h) Partitioning methods and means of preventing partitioning breaches.
(i) Descriptions of the software components, whether they are new or previously developed, with reference to the baseline from which they were taken.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Development: DO-178B Requirements for Design Descriptions
(j) Derived requirements from the software design process.
(k) If the system contains deactivated code, a description of the means to ensure that the code cannot be enabled in the target computer.
(l) Rationale for those design decisions that are traceable to safety-related system requirements.
\slideitem{ Deactivated code (k) (see Ariane 5).
\slideitem{ Traceability issues interesting (l).
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Development: DO-178B Key Issues
\slideitem{ Traceability and lifecycle focus.
\slideitem{ Designated engineering reps.
\slideitem{ Recommended practices.
\slideitem{ Design verification:
- formal methods "alternative" only;
- "inadequate maturity";
- limited applicability in aviation.
\slideitem{Design validation:
- use of independent assessors etc.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
DO-178B - NASA GCS Case Study
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
DO-178B - NASA GCS Case Study
NASA Langley Research Centre.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
DO-178B - NASA GCS Case Study
NASA Langley Research Centre.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
DO-178B - NASA GCS Case Study
\slideitem{Project compared:
- faults found in statistical tests;
- faults found in 178B development.
\slideitem{Main conclusions:
- such comparisons very difficult;
- DO-178B hard to implement;
- lack of materials/examples.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Development: DO-178B Practitioners' View
The difficulties that have been identified are the DO-178 requirements for evidence and
rigorous verification...
Systematic records of accomplishing each of the
objectives and guidance are necessary. A documentation trail must exist
demonstrating that the development processes not only were carried out, but also
were corrected and updated as necessary during the program life cycle. Each
document, review, analysis, and test must have evidence of critique for accuracy and
completeness, with criteria that establishes consistency and expected results. This is
usually accomplished by a checklist which is archived as part of the program
certification records. The degree of this evidence varies only by the safety criticality of
the system and its software.
Original source on http://stsc.hill.af.mil/crosstalk/1998/oct/schad.asp
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Development: DO-178B Practitioners' View
...Engineering has not been schooled or trained to
meticulously keep proof of the processes, product, and
verification real-time. The engineers have focused on the
development of the product, not the delivery. In addition,
program durations can be from 10 to 15 years resulting in
the software engineers moving on by the time of system
delivery. This means that most management and engineers
have never been on a project from "cradle-to-grave."
Original source on http://stsc.hill.af.mil/crosstalk/1998/oct/schad.asp
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Hardware Design: Fault Tolerant Architectures
\slideitem{The basics of hardware management.
\slideitem{Fault models.
\slideitem{Hardware redundancy.
\slideitem{Space Shuttle GPC Case Study.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Parts Management Plan
\slideitem{MIL-HDBK-965
- help on hardware acquisition.
\slideitem{ General dependability requirements.
\slideitem{ Not just about safety.
\slideitem{But often not considered enough...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
The Basics: Hardware Management
\slideitem{MIL-HDBK-965
Acquisition Practices for Parts Management
\slideitem{ Preferred Parts List
\slideitem{ Vendor and Device Selection
\slideitem{ Critical Devices, Technologies & Vendors
\slideitem{ Device Specifications
\slideitem{ Screening
\slideitem{ Part Obsolescence
\slideitem{ Failure Reporting, Analysis and Corrective Action (FRACAS)
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
The Basics: Hardware Management
Some consequences of designing equipment without a PPL are:
\slideitem{ Proliferation of non-preferred parts and materials with identical
functions
\slideitem{ Increased need for development and preparation of engineering
justification for
new parts and materials
\slideitem{ Increased need for monitoring suppliers and inspecting/screening
parts and materials
\slideitem{ Selection of obsolete (or potentially obsolete) and sole-sourced
parts and materials
\slideitem{ Possibility of diminishing sources
\slideitem{ Use of unproven or exotic technology ("beyond" state-of-the-art)
\slideitem{ Incompatibility with the manufacturing process
\slideitem{ Inventory volume expansion and cost increases
\slideitem{ Increasing supplier base and audit requirements
\slideitem{ Loss of "ship-to-stock" or "just-in-time" purchase opportunities
\slideitem{ Limited ability to benefit from volume buys
\slideitem{ Increased cost and schedule delays
\slideitem{ Nonavailability of reliability data
\slideitem{ Additional tooling and assembly methods may be required to account
for the added
variation in part characteristics
\slideitem{ Decreased part reliability due to the uncertainty and lack of
experience with new parts
\slideitem{ Impeded automation efforts due to the added variability of part
types
\slideitem{ Difficulty in monitoring vendor quality due to the added number of
suppliers
\slideitem{ More difficult and expensive logistics support due to the increased
number of part
types that must be spared.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
The Basics: Hardware Management
Must consider during hardware acquisition:
\slideitem{ Operating Temperature Range - parts should be selected which are rated
for the
operating temperature range to which they will be subjected.
\slideitem{ Electrical Characteristics - parts should be selected to meet EMI,
frequency,
waveform and signal requirements and maximum applied electrical stresses
(singularly
and in combination).
\slideitem{ Stability - parts should be selected to meet parameter stability
requirements based on
changes in temperature, humidity, frequency, age, etc.
\slideitem{ Tolerances - parts should be selected that will meet tolerance
requirements, including
tolerance drift, over the intended life.
\slideitem{ Reliability - parts should be selected with adequate inherent
reliability and properly
derated to achieve the required equipment reliability. Dominant failure
modes should
be understood when a part is used in a specific application.
\slideitem{ Manufacturability - parts should be selected that are compatible with
assembly
manufacturing process conditions.
\slideitem{ Life - parts should be selected that have "useful life" characteristics
(both operating and
storage) equal to or greater than that intended for the life of the
equipment in which they
are used.
\slideitem{ Maintainability - parts should be selected that consider mounting
provisions, ease of
removal and replacement, and the tools and skill levels required for
their removal/
replacement/repair.
\slideitem{ Environment - parts should be selected that can operate successfully in
the
environment in which they will be used (i.e., temperature, humidity,
sand and dust, salt
atmosphere, vibration, shock, acceleration, altitude, fungus, radiation,
contamination,
corrosive materials, magnetic fields, etc.).
\slideitem{ Cost - parts should be selected which are cost effective, yet meet the
required
performance, reliability, and environmental constraints, and life cycle
requirements.
\slideitem{ Availability - parts should be selected which are readily available,
from more than one
source, to meet fabrication schedules, and to ensure their future
availability to support
repairs in the event of failure.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Types of Faults
\slideitem{Design faults:
- erroneous requirements;
- erroneous software;
- erroneous hardware.
\slideitem{These are systemic failures;
- not due to chance but design.
\slideitem{Dont forget management/regulators!
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Types of Faults
\slideitem{Intermittent faults:
- fault occurs and recurrs over time;
- fault connections can recur.
\slideitem{Transient faults:
- fault occurs but may not recurr;
- electromagnetic interference.
\slideitem{Permanent faults:
- fault persists;
- physical damage to processor.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Models
\slideitem{Single stuck-at models.
\slideitem{Hardware seen as `black-box'.
\slideitem{Fault modelled as:
- input or output error;
- stuck at either 1 or 0.
\slideitem{Models permanent faults.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Models - Single Stuck-At...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Models
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Models
\slideitem{Bridging Model:
- input not `stuck-at' 1 or 0;
- but shorting of inputs to circuit;
- input then is wired-or/wired-and.
\slideitem{Stuck-open model:
- both CMOS output transistors off;
- results is neither high nor low...
\slideitem{Transition and function models.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Faults (Aside...)
\slideitem{Much more could be said...
- see Leveson or Storey.
\slideitem{Huge variability:
- specification errors;
- coding errors;
- translation errors;
- run-time errors...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Redundancy
\slideitem{ Adds:
- cost;
- weight;
- power consumption;
- complexity (most significant).
\slideitem{These can outweigh safety benefits.
\slideitem{Other techniques available:
- improved maintenance;
- better quality materials;
\slideitem{Sometimes no choice (Satellites).
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Hardware Redundancy Techniques
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Active Redundancy
\slideitem{ When component fails...
\slideitem{ Redundant components do not have:
- to detect component failure;
- to switch to redundant resource.
\slideitem{ Redundant units always operate.
\slideitem{ Automatically pick up load on failure.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Standby Redundancy
\slideitem{ Must detect failure.
\slideitem{Must decide to replace component.
\slideitem{Standby units can be operating.
\slideitem{Stand-by units may be brought-up.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Example Redundancy Techniques
Bimodal Parallel/Series Redundancy.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Triple Modular Redundancy (TMR)
\slideitem{ Possibly most widespread.
\slideitem{In simple voting arrangement,
- voting element -> common failure;
- so triplicate it as well.
\slideitem{ Multi-stage TMR architectures.
\slideitem{More cost, more complexity...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Multilevel Triple Modular Redundancy (TMR)
\slideitem{ No protection if 2 fail per level.
\slideitem{No protection from common failure
- eg if hard/software is duplicated.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Detection
\slideitem{Functionality checks:
- routines to check hardware works.
\slideitem{Signal Comparisons:
- compare signal in same units.
\slideitem{Information Redundancy:
- parity checking, M out of N codes...
\slideitem{Watchdog timers:
- reset if system times out.
\slideitem{Bus monitoring:
- check processor is `alive'.
\slideitem{Power monitoring:
- time to respond if power lost.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Space Shuttle General Purpose Computer (GPC) Case Study
"GPCs running together in the same GN&C (Guidance, Navigation and Control) OPS (Operational Sequence) are part of a redundant set performing identical tasks from the same inputs and
producing identical outputs. Therefore, any data bus assigned to a commanding GN&C GPC is heard by all members of the
redundant set (except the instrumentation buses because each GPC has only one dedicated bus connected to it). These
transmissions include all CRT inputs and mass memory transactions, as well as flight-critical data. Thus, if one or more GPCs in
the redundant set fail, the remaining computers can continue operating in GN&C. Each GPC performs about 325,000 operations
per second during critical phases. "
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Space Shuttle General Purpose Computer (GPC) Case Study
GPC status information among the primary avionics computers. If a GPC operating in a redundant set fails to meet two redundant
multiplexer interface adapter receiver during two successive reads of response data and does not receive any data while the other
members of the redundant set do not receive the data, they in turn will vote the GPC out of the set. A failed GPC is halted as
soon as possible."
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Space Shuttle General Purpose Computer (GPC) Case Study
"GPC failure votes are annunciated in a number of ways. The GPC status matrix on panel O1 is a 5-by-5 matrix of lights. For
example, if GPC 2 sends out a failure vote against GPC 3, the second white light in the third column is illuminated. The yellow
diagonal lights from upper left to lower right are self-failure votes. Whenever a GPC receives two or more failure votes from
other GPCs, it illuminates its own yellow light and resets any failure votes that it made against other GPCs (any white lights in its
row are extinguished). Any time a yellow matrix light is illuminated, the GPC red caution and warning light on panel F7 is
illuminated, in addition to master alarm illumination, and a GPC fault message is displayed on the CRT. "
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Space Shuttle General Purpose Computer (GPC) Case Study
"Each GPC power on , off switch is a guarded switch. Positioning a switch to on provides the computer with triply redundant
normally, even if two main or essential buses are lost. "
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Space Shuttle General Purpose Computer (GPC) Case Study
"(There are) 5 identical general-purpose computers aboard the orbiter control space shuttle vehicle systems.
Each GPC is composed of two
separate units, a central processor unit and an input/output processor. All five GPCs are IBM AP
-101 computers. Each CPU and
IOP contains a memory area for storing software and data. These memory areas are collectively re
ferred to as the GPC's main
memory.
The IOP of each computer has 24 independent processors, each of which controls 24 data buses use
d to transmit serial digital
data between the GPCs and vehicle systems, and secondary channels between the telemetry system a
nd units that collect
instrumentation data. The 24 data buses are connected to each IOP by multiplexer interface adapt
ers that receive, convert and
validate the serial data in response to discrete signals calling for available data to be transmitted
or received from vehicle hardware."
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Space Shuttle General Purpose Computer (GPC) Case Study
"A GPC on orbit can also be ''freeze-dried;'' that is, it can be loaded with the software for a particular memory configuration and
then moded to standby. It can then be moded to halt and powered off. Since the GPCs have non-volatile memory, the software
is retained. Before an OPS transition to the loaded memory configuration, the freeze-dried GPC can be moded back to run and
the appropriate OPS requested.
A failed GPC can be hardware-initiated, stand-alone-memory-dumped by switching the powered computer to terminate and halt
and then selecting the number of the failed GPC on the GPC memory dump rotary switch on panel M042F in the crew
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Space Shuttle General Purpose Computer (GPC) Case Study
"A simplex GPC is one in run and not a member of the redundant set, such as the BFS (Backup Flight System) GPC. Systems management and payload
major functions are always in a simplex GPC."
"Even though the four primary avionics software system GPCs control all GN&C functions during the critical phases of the
mission, there is always a possibility that a generic failure could cause loss of vehicle control. Thus, the fifth GPC is loaded with
different software created by a different company than the PASS developer. This different software is the backup flight system.
To take over control of the vehicle, the BFS monitors the PASS GPCs to keep track of the current state of the vehicle. If
required, the BFS can take over control of the vehicle upon the press of a button. The BFS also performs the systems
management functions during ascent and entry because the PASS GPCs are operating in GN&C. BFS software is always loaded
into GPC 5 before flight, but any of the five GPCs could be made the BFS GPC if necessary."
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Hardware Design: Fault Tolerant Architectures
\slideitem{The basics of hardware management.
\slideitem{Fault models.
\slideitem{Hardware redundancy.
\slideitem{Space Shuttle GPC Case Study.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Hardware Implementation Issues
\slideitem{COTS Microprocessors.
\slideitem{Specialist Microprocessors.
\slideitem{Programmable Logic Controllers
\slideitem{Electromagnetic Compatability
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
COTS Microprocessors
\slideitem{As we have seen:
- safety of software jeopardised
- if flaws in underlying hardware.
\slideitem{Catch-22 problem:
- best tools for COTS processors;
- most experience with COTS;
- least assurance with COTS...
\slideitem{Redundancy techniques help...
- but danger of common failures;
- vs cost of heterogeneity;
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
COTS Microprocessors
\slideitem{Where do the faults arise?
1. fabrication failures;
2. microcode errors;
3. documentaiton errors.
\slideitem{Can guard against 1:
- using same processing mask;
- tests then apply to all of batch;
- high cost (specialist approach).
\slideitem{Cannot distinguish 2 from 3?
\slideitem{Undocumented instructions...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
COTS Microprocessors
"Steven O. Siegfried"
\slideitem{Validation at start and end.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Validation: Spiral Model
\slideitem{Validation more continuous.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Validation: IEC 61508 (Draft)
The following should be considered in an overall safety validation plan:
\slideitem{ Details of when the validation should take place.
\slideitem{ Details of who should carry out the validation.
\slideitem{ Identification of the relevant modes of the system operation, including:
\slideitem{ preparation for use, including setting up and adjustment
\slideitem{ start up
\slideitem{ teach
\slideitem{ automatic
\slideitem{ manual
\slideitem{ semi-automatic
\slideitem{ steady-state operation
\slideitem{ resetting
\slideitem{ shutdown
\slideitem{ maintenance
\slideitem{ reasonably foreseeable abnormal conditions
\slideitem{ Identification of the safety-related systems and external risk reduction facilities that need to be validated for each mode of the system before commissioning commences.
\slideitem{ The technical strategy for the validation, for example, whether analytical methods or statistical tests are to be used.
\slideitem{ The measures, techniques and procedures that shall be used to confirm that each safety function conforms with the overall safety requirements documents and the safety integrity requirements.
\slideitem{ The specific reference to the overall safety requirements documents.
\slideitem{ The required environment in which the validation activities are to take place.
\slideitem{ The pass/fail criteria.
\slideitem{ The policies and procedures for evaluating the results of the validation, particularly failures.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Validation: MOD DEF STAN 00-60
D.4.1.6 Validation.
At the earliest opportunity support resource
requirements should be
confirmed and measurements should be made of times for completion of all
software
operation and support tasks. Where such measurements are dependent upon
the system state
or operating conditions, averages should be determined over a range of
conditions. If
measurements are based on non-representative hardware or operating
conditions, appropriate
allowances should be made and representative measurements carried out as
soon as possible.
The frequency of some software support tasks will be dependent upon the
frequency of
software releases and the failure rate exhibited by the software.
Integrated Logistic Support: Part 3, Guidance for Software
Support.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Validation: MOD DEF STAN 00-60
D.4.1.6 Validation. (Cont.)
Measurements of software failure rates and fault densities obtained
during software and
system testing might not be representative of those that will arise
during system operation.
However, such measurements may be used, with caution, in the validation
of models and
assumptions.
For repeatable software engineering activities, such as compilation and
regression testing, the
time and resource requirements that arose during development should be
recorded. Such
information may be used to validate estimates for equivalent elements of
the software
modification process.
For other software engineering activities, such as analysis, design and
coding, the time and
resource requirements that arose during development should be recorded.
However, such
information should only be used with some caution in the validation of
estimates for
equivalent elements of the software modification process.
The preceding clauses might imply the need for a range of metrics
Integrated Logistic Support: Part 3, Guidance for Software
Support.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Validation: Summary of Key Issues
\slideitem{ Who validates validator?
- External agents must be approved.
\slideitem{ Who validates validation?
- Clarify links to certification.
\slideitem{What happens if validation fails?
- Must have feedback mechanisms;
- Links to process improvement?
\slideitem{NOT the same as verification!
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Verification: Leveson's Strategies
\slideitem{Show that functional requirements
- are consistent with safety criteria ?
\slideitem{Implementation may include hazards
not in safety/functional requirements.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Verification: Leveson's Strategies
\slideitem{Show that implementation is
- same as functional requirements?
\slideitem{Too costly and time consuming
all safety behaviour in specification?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Verification: Leveson's Strategies
\slideitem{Or show that the implementation
- meets the safety criteria.
\slideitem{Fails if criteria are incomplete...
- but can find specification errors.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Verification: Lifecycle View
\slideitem{At several stages in waterfall model.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Verification: Lifecycle View
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Verification
\slideitem{Verification as a catch-all?
"Verification is defined as determining whether or not the products of each phase of the software development process fulfills all the requirements from the previous phase."
\slideitem{So a recurrent cost, dont forget...
- verification post maintenance.
\slideitem{Verification supported by:
- determinism (repeat tests);
- separate safety-critical functions;
- well defined processes;
- simplicity and decoupling.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Verification
D.5.1 Task 501 Supportability Test, Evaluation and
Verification
D.5.1.1 Test and Evaluation Strategy.
Strategies for the evaluation of
system supportability
should include coverage of software operation and software support.
Direct measurements
and observations may be used to verify that all operation and support
activities - that do not
involve design change - may be completed using the resources that have
been allocated.
During the design and implementation stage measurements may be conducted
on similar
systems, under representative conditions.
As software modification activity is broadly similar to software
development the same
monitoring mechanism might be used both pre- and post-implementation.
Such a mechanism
is likely to be based on a metrics programme that provides information,
inter alia, on the rate
at which software changes are requested and on software productivity.
Integrated Logistic Support: Part 3, Guidance for Software
Support.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Verification
D.5.1.3 Objectives and Criteria.
System test and evaluation programme
objectives should
include verification that all operation and support activities may be
carried out successfully -within
skill and time constraints - using the PSE and other resources that have
been defined.
The objectives, and associated criteria, should provide a basis for
assuring that critical
software support issues have been resolved and that requirements have
been met within
acceptable confidence levels. Any specific test resources, procedures or
schedules necessary
to fulfil these objectives should be included in the overall test
programme. Programme
objectives may include the collection of data to verify assumptions,
models or estimates of
software engineering productivity and change traffic.
Integrated Logistic Support: Part 3, Guidance for Software
Support.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Verification
D.5.1.4 Updates and Corrective Actions.
Evaluation results should be
analyzed and
corrective actions determined as required. Shortfalls might arise from:
\slideitem{ Inadequate resource provision for operation and support tasks.
\slideitem{ Durations of tasks exceeding allowances.
\slideitem{ Software engineering productivity not matching expectations.
\slideitem{ Frequencies of tasks exceeding allowances.
\slideitem{ Software change traffic exceeding allowances.
Corrective actions may include: increases in the resources available;
improvements in
training; additions to the PSE or changes to the software, the support
package or, ultimately,
the system design. Although re-design of the system or its software
might deliver long term
benefits it would almost certainly lead to increased costs and programme
slippage.
Integrated Logistic Support: Part 3, Guidance for Software
Support.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Verification: Summary of Key Issues
\slideitem{What can we affoard to verify?
\slideitem{ Every product of every process?
- MIL HDBK 338B...
\slideitem{ Or only a few key stages?
\slideitem{If the latter, do we verify :
- specification by safety criteria?
- implementation by safety criteria?
- or both...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Verification: Summary of Key Issues
\slideitem{Above all....
\slideitem{Verification is about proof.
\slideitem{Proof is simply an argument.
\slideitem{Argument must be correct but
- not a mathematical `holy grail'...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Introduction
\slideitem{ Validation and Verification.
\slideitem{ What are the differences?
\slideitem{When, why and who?
\slideitem{UK MOD DEF STAN 00-66
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Testing
\slideitem{ The processes used during:
- validation and verification.
\slideitem{White and black boxes.
\slideitem{Static and Dynamic techniques
\slideitem{Mode confusion case study.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Definitions and Distinctions
\slideitem{ Black box tests:
- tester has no access to information
- about the system implementation.
\slideitem{Good for independence of tester.
\slideitem{But not good for formative tests.
\slideitem{Hard to test individual modules...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Definitions and Distinctions
\slideitem{ White box tests:
- tester can access information about
- the system implementation.
\slideitem{Simplifies diagnosis of results.
\slideitem{Can compromise independence?
\slideitem{How much do they need to know?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Definitions and Distinctions
\slideitem{ Module testing:
- tests well-defined subset.
\slideitem{ Systems integration:
- tests collections of modules.
\slideitem{ Acceptance testing:
- system meets requirements?
\slideitem{Results must be documented.
\slideitem{Changes will be costly.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Dynamic Testing - Process Issues
\slideitem{ Functional testing:
- test cases examine functionality;
- see comments on verification.
\slideitem{ Structural testing:
- knowledge of design guides tests;
- interaction between modules...
- test every branch (coverage)?
\slideitem{ Random testing:
- choose from possible input space;
- or beyond the "possible"...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Definitions and Distinctions
\slideitem{ Dynamic testing:
- execution of system components;
- is environment being controlled?
\slideitem{ Static testing:
- investigation without operation;
- pencil and paper reviews etc.
\slideitem{Most approaches use both.
\slideitem{Guide the test selection by using:
- functional requirements:
- safety requirements;
- (see previous lecture).
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Definitions and Distinctions
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Dynamic Testing
\slideitem{ Where do you begin?
\slideitem{Look at the original hazard analysis;
- demonstrate hazard elimination?
- demonstrate hazard reduction?
- demonstrate hazard control?
\slideitem{Must focus both on:
- expected and rare conditions.
\slideitem{PRA can help - but for software?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Dynamic Testing - Leveson's Process Issues
\slideitem{ Review test plans.
\slideitem{ recommend tests based on the hazard analyses, safety standards and checklists, previous accident and incidents, operator task analyses etc.
\slideitem{ Specify the conditions under which the test will be conducted.
\slideitem{ Review the test results for any safety-related problems that were missed in the analysis or in any other testing.
\slideitem{ Ensure that the testing feedback is integrated into the safety reviews and analyses that will be used in design modifications.
\slideitem{ All of this will cost time and money.
\slideitem{ Must be planned, must be budgeted.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Dynamic Testing Techniques
\slideitem{ Partitioning:
- identify groups of input values;
- do they map to similar outputs?
\slideitem{ Boundary analysis:
- extremes of valid/invalid input.
\slideitem{ Probabilistic Testing:
- examine reliability of system.
\slideitem{ (State) Transition tests:
- trace states, transitions and events.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Dynamic Testing Techniques
\slideitem{ Simulation:
- assess impact on EUC (IEC61508).
\slideitem{ Error seeding:
- put error into implementation;
- see is test discover it (dangerous).
\slideitem{ Performance monitoring:
- check real-time, memory limits.
\slideitem{ Stress tests:
- abnormally high workloads?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Dynamic Testing - Software Issues
\slideitem{ Boundary conditions.
\slideitem{ Incorrent and unexpected inputs sequences.
\slideitem{ Altered timings - delays and over-loading.
\slideitem{ Environmental stress - faults and failures.
\slideitem{ Critical functions and variables.
\slideitem{ Firewalls, safety kernels and other special safety features.
\slideitem{ Usual suspects...automated tests?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Limitations of Dynamic Testing
\slideitem{ Cannot test all software paths.
\slideitem{Cannot even text all hardware faults.
\slideitem{Not easy to test in final environment:
\slideitem{User interfaces very problematic:
"../reports/butler-etal-dasc98.pdf">A Formal Methods Approach to the Analysis of
Mode Confusion. In AIAA/IEEE Digital Avionics Systems Conference, October
, 1998.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Formal Methods: Mode Confusion Case Study
1. Opacity (i.e., poor display of automation state),
2. Complexity (i.e., unnecessarily complex automation),
3. incorrect mental model (i.e., the flight crew misunderstands the behaviourr of the automation).
Traditional human factors has concentrated on (1), and made significant progress has been made.
However, mitigation of mode confusion will require addressing problem sources (2) and (3) as well.
Towards this end, our approach uses two complementary strategies based upon a formal model:
Visualisation
Create a clear, executable model of the automation that is easily understood by flight crew and use it to drive a flight deck mockup from the formal model
Analysis
Conduct mathematical analysis of the model.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Formal Methods: Mode Confusion Case Study
\slideitem{Problems stemming from modes:
- input has different effect;
- uncommanded mode changes;
- different modes->behaviours;
- different intervention options;
- poor feedback.
\slideitem{ObjectTime visualisation model...
\slideitem{Represent finite state machines.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Formal Methods: Mode Confusion Case Study
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Formal Methods: Mode Confusion Case Study
The state of the
Flight Director (FD), Autopilot (AP), and each
of the lateral and vertical modes are modeled as
In Figure 3 (see previous slide), the FD is On with the guidance cues
displayed; the AP is Engaged; lateral Roll,
Heading, and Approach modes are Cleared; lat-eral
NAV mode is Armed; vertical modes Pitch,
Approach, and AltHold are Cleared; and the VS
mode is Active. Active modes are those that
actually control the aircraft when the AP is en-gaged.
These are indicated by the heavy dark
boxes around the Active, Track, and lateral
Armed modes.
"../reports/butler-etal-dasc98.pdf">A Formal Methods Approach to the Analysis of
Mode Confusion.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Formal Methods: Mode Confusion Case Study
\slideitem{ObjectTime model:
- give pilots better mental model?
- drive simulation (dynamic tests?).
\slideitem{Build more complete FGS model
- prove/test for mode problems.
\slideitem{ Discrete maths:
- theorem proving;
- or model checking?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Formal Methods: Mode Confusion Case Study
The first problem is formally defining
what constitutes an indirect mode change. Lets
begin by defining it as a mode change that occurs
when there has been no crew input:
Indirect_Mode_Change?(s,e): bool =
NOT Crew_input?(e) AND Mode_Change?(s,e)
No_Indirect_Mode_Change: LEMMA
Valid_State?(s) IM\slideitem{S
NOT Indirect_Mode_Change?(s,e)
"../reports/butler-etal-dasc98.pdf">A Formal Methods Approach to the Analysis of
Mode Confusion.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Formal Methods: Mode Confusion Case Study
We then seek to prove the false lemma above
using GRIND, a brute force proof strategy that
works well on lemmas that do not involve quantification.
The resulting unproved sequents
elaborate the conditions where indirect mode
changes occur. For example,
{-1} Overspeed_Event?(e!1)
{-2} OFF?(mode(FD(s!1)))
{-3} s!1 WITH [FD := FD(s!1) WITH [mode := CUES],
LATERAL := LATERAL(s!1) WITH
[ROLL := (# mode := ACTIVE #)],
VERTICAL := VERTICAL(s!1) WITH
[PITCH := (# mode := ACTIVE #)]]
= NS
{-4} Valid_State(s!1)
|-------
{1} mode(PITCH(VERTICAL(s!1))) =
mode(PITCH(VERTICAL(NS)))
The situations where indirect mode
changes occur are clear from the negatively labeled
formulas in each sequent. We see that an
indirect mode change occurs when the overspeed
event occurs and the Flight Director is off.
This event turns on the Flight Director and
places the system into modes ROLL and
PITCH.
"../reports/butler-etal-dasc98.pdf">A Formal Methods Approach to the Analysis of
Mode Confusion.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Formal Methods: Mode Confusion Case Study
We define an ignored command as one in
which there is a crew input and there is no mode
change. We seek to prove that this never happens:
No_Ignored_Crew_Inputs: LEMMA
Valid_State(s) AND Crew_Input?(e) IM\slideitem{S
NOT Mode_Change?(s,e)
The result of the failed proof attempt is a set of
sequents similar to the following:
{-1} VS_Pitch_Wheel_Changed?(e!1)
{-2} CUES?(mode(FD(s!1)))
{-3} TRACK?(mode(NAV(LATERAL(s!1))))
{-4} ACTIVE?(mode(VS(VERTICAL(s!1))))
|-------
{1} ACTIVE?(mode(ROLL(LATERAL(s!1))))
{2} ACTIVE?(mode(HDG(LATERAL(s!1))))
The negatively labeled formulas in the
sequent clearly elaborate the case where an input
is ignored, i.e., when the VS/Pitch Wheel is
changed and the Flight Director is displaying
CUES and the active lateral mode is ROLL and
the active vertical mode is PITCH. In this way,
PVS is used to perform a state exploration to
discover all conditions where the lemma is false,
i.e., all situations in which a crew input is ignored.
"../reports/butler-etal-dasc98.pdf">A Formal Methods Approach to the Analysis of
Mode Confusion.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Formal Methods: Mode Confusion Case Study
\slideitem{Are these significant for user?
\slideitem{Beware:
- atypical example of formal methods;
- havent mentioned refinement;
- havent mentioned implementation;
- much more could be said...
- see courses on formal methods.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Testing
\slideitem{ The processes used during:
- validation and verification.
\slideitem{White and black boxes.
\slideitem{Static and Dynamic techniques
\slideitem{Mode confusion case study.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Individual Human Error
\slideitem{Slips, Lapses and Mistakes.
\slideitem{Rasmussen: Skill, Rules, Knowledge.
\slideitem{Reason: Generic Error Modelling.
\slideitem{Risk Homeostasis.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
What is Error?
\slideitem{Deviation from optimal performance?
- very few achieve the optimal.
\slideitem{Failure to achive desired outcome?
- desired outcome can be unsafe.
\slideitem{Departure from intended plan?
- but environment may change plan...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
What is Error?
Acknowledgement:J. Reason, Human Error, Cambridge University Press, 1990 (ISBN-0-521-31419-4).
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Types of Errors...
\slideitem{Slips:
- correct plan but incorrect action;
- more readily observed.
\slideitem{Lapses:
- correct plan but incorrect action;
- failure of memory so more covert?
\slideitem{Mistakes:
- incorrect plan;
- more complex, less understood.
\slideitem{Human error modelling helps to:
- analyse/distinguish error types.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Rasmussen: Skill, Rules and Knowledge
\slideitem{Skill based behaviour:
- sensory-motor performance;
- without conscious control;
- automated, high-integrated.
\slideitem{Rule based behaviour:
- based on stored procedures;
- induced by experience or taught;
- problem solving/planning.
\slideitem{Knowledge based behaviour:
- in unfamilliar situations;
- explicitly think up a goal;
- develop a plan by selection;
- try it and see if it works.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Rasmussen: Skill, Rules and Knowledge
Acknowledgement: J. Rasmussen, Skill, Rules, Knowledge: Signals, Signs and Symbols and Other Distinctions in Human Performance Models. IEEE Transactions on Systems, Man and Cybernetics (SMC-13)3:257-266, 1983.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Rasmussen: Skill, Rules and Knowledge
\slideitem{Signals:
- sensory data from environment;
- continuous variables;
- cf Gibson's direct perception.
\slideitem{Signs:
- indicate state of the environment;
- with conventions for action;
- activate stored pattern or action.
\slideitem{Symbols:
- can be formally processed;
- related by convention to state.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Rasmussen: Skill, Rules and Knowledge
\slideitem{ Skill-based errors:
- variability of human performance.
\slideitem{ Rule-based errors:
- misclassification of situations;
- application of wrong rule;
- incorrect recall of correct rule.
\slideitem{ Knowledge-based errors:
- incomplete/incorrect knowledge;
- workload and external constraints...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Building on Rasmussen's Work
\slideitem{How do we account for:
- slips and lapses in SKR?
\slideitem{Can we distinguish:
- more detailed error forms?
- more diverse error forms?
\slideitem{Before an error is detected:
- operation is, typically, skill based.
\slideitem{After an error is detected:
- operation is rule/knowledge based.
\slideitem{GEMS builds on these ideas...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Monitoring Failures
\slideitem{Normal monitoring:
- typical before error is spotted;
- preprogrammed behaviours plus;
- attentional checks on progress.
\slideitem{Attentional checks:
- are actions according to plan?
- will plan still achieve outcome?
\slideitem{Failure in these checks:
- often leads to a slip or lapse.
\slideitem{Reason also identifies:
- Overattention failures.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Problem Solving Failures
\slideitem{Humans are pattern matchers:
- prefer to use (even wrong) rules;
- before effort of knowledge level.
\slideitem{Local state information:
- indexes stored problem handling;
- schemata, frames, scripts etc.
\slideitem{Misapplication of good rules:
- incorrect situation assessment;
- over-generalisation of rules.
\slideitem{Application of bad rules:
- encoding deficiencies;
- action deficiencies.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Knowledge-Based Failures
\slideitem{Thematic vagabonding:
- superficial analysis/behaviour;
- flit from issue to issue.
\slideitem{Encysting:
- myopic attention to small details;
- meta-level issues may be ignored.
\slideitem{Reason:
- individual fails to recognise failure;
- does not face up to consequences.
\slideitem{Berndt Brehmer & Dietrich Doerner.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Failure Modes and the SKR Levels
Acknowledgement:J. Reason, Human Error, Cambridge University Press, 1990 (ISBN-0-521-31419-4).
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Error Detection
\slideitem{Dont try to eliminate errors:
- but focus on their detection.
\slideitem{Self-monitoring:
- correction of postural deviations;
- correction of motor responses;
- detection of speech errors;
- detection of action slips;
- detection of problem solving error.
\slideitem{How do we support these activities?
- standard checks procedures?
- error hypotheses or suspicion?
- use simulation based training?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Error Detection
\slideitem{Dont try to eliminate errors:
- but focus on their detection.
\slideitem{Environmental error cueing:
- block users progress;
- help people discover error;
- "gag" or prevent input;
- allow input but warn them;
- ignore erroneous input;
- self correct;
- force user to explain..
\slideitem{Importance of other operators
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Error Detection
\slideitem{Cognitive barriers to error detection.
\slideitem{Relevance bias:
- users cannot consider all evidence;
- "confirmation bias".
\slideitem{ Partial explanations:
- users accept differences between
- "theory about state" and evidence.
\slideitem{Overlaps:
- even incorrect views will receive
- some confirmation from evidence.
\slideitem{"Disguise by familliarity".
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Practical Application
\slideitem{So how do we use GEMS?
\slideitem{Try to design to avoid all error?
\slideitem{Use it to guide employee selection?
\slideitem{Or only use it post hoc:
- to explain incidents and accidents?
\slideitem{No silver bullet, no panacea.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Practical Application
\slideitem{Eliminate error affoardances:
- increase visibility of task;
- show users constraints on action.
\slideitem{Decision support systems:
- dont just present events;
- provide trend information;
- "what if" subjunctive displays;
- prostheses/mental crutches?
\slideitem{Memory aids for maintenance:
- often overlooked;
- aviation task cards;
- must maintain maintenance data!
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Practical Application
\slideitem{Improve training:
- procedures or heuristics?
- simulator training (contentious).
\slideitem{Error management:
- avoid high-risk strategies;
- high probability/cost of failure.
\slideitem{Ecological interface design:
- Rasmussen and Vincente;
- 10 guidelines (learning issues).
\slideitem{Self-awareness:
- when might I make an error?
- contentious...
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Practical Application
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Outstanding Issues
\slideitem{Problem of intention:
- is an error a slip or lapse?
- is an error a mistake of intention?
\slideitem{Given an observations of error:
- aftermath of accident/incident;
- guilt, insecurity, fear, anger.
\slideitem{Can we expect valid answers?
\slideitem{Can we make valid inferences?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Outstanding Issues
\slideitem{ GEMS focusses on causation:
- built on Rasmussens SKR model;
- therefore, has explanatory power
\slideitem{Hollnagel criticises it:
- difficult to apply in the field;
- do observations map to causes?
\slideitem{Glasgow work has analysed:
- GEMS plus active/latent failures;
\slideitem{Results equivocal, GEMS:
- provides excellent vocabulary;
- can be hard to perform mapping.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Outstanding Issues (Risk Homeostasis Theory)
\slideitem{What happens if we introduce the
- decision aids Reason suggests?
"Each road user has a target (or accepted) level of risk which acts as a comparison with actual risk.
Where a difference exists, one may move towards the other.
Thus, when a safety improvement occurs, the target level of risk motivates behaviour to compensate - e.g., drive faster or with less attention.
Risk homeostasis theory (RHT) has not beenconcerned with the cognitive or behavioural pathways by which homeostasis occurs, only with the consequences of adjustments in terms of accident loss."
Acknowledgement: T.W. Hoyes and A.I. Glendon, Risk Homeostasis: Issues for Further research, Safety Science, 16:19-33, (1993).
\slideitem{Will users accept more safety?
- or trade safety for performance?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Outstanding Issues (Risk Homeostasis Theory)
\slideitem{Very contentions.
\slideitem{Bi-directionality?
- what if safety levels fall?
- will users be more cautious?
\slideitem{Does it affect all tasks?
\slideitem{Does it affect work/leisure?
\slideitem{How do we prove/disprove it?
- unlikely to find it in simulators.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Conclusions: Individual Human Error
\slideitem{Slips, Lapses and Mistakes.
\slideitem{Rasmussen: Skill, Rules, Knowledge.
\slideitem{Reason: Generic Error Modelling.
\slideitem{Risk Homeostasis.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work
\slideitem{Workload.
\slideitem{Situation Awareness.
\slideitem{Crew Resource Management
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Workload
\slideitem{High workload:
- stretches users resources.
\slideitem{Low workload:
- wastes users resources;
- can inhibit ability to respond.
\slideitem{Cannot be "seen" directly;
- is inferred from behaviour.
\slideitem{No widely accepted definition?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Workload
"Physical workload is a straightforward concept. It is easy to measure and define in terms of energy expenditure. Traditional human factors texts tell us how to measure human physical work in terms of kilocalories and oxygen consumption..."
Acknowledgement: B.H. Kantowitz and P.A. Casper, Human Workload in aviation. In E.L. Wiener and D.C. Nagel (eds.), Human Factors in Aviation, 157-187, Academic Press, London, 1988.
"The experience of workload is based on the amount of effort, both physical and psycholoigcal, expended in response to system demands (taskload) and also in accordance with the operator's internal standard of performance."
Acknowledgement: E.S. Stein and B. Rosenberg, The Measurement of Pilot Workload, Federal Aviation Authority, Report DOT/FAA/CT82-23, NTIS No. ADA124582, Atlantic City, 1983.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Workload
- Wickens on perceptual channels;
- Kantowitz on problem solving;
- Hart on overall experience.
\slideitem{Holistic vs atomistic approaches:
- FAA (+ Seven) a gestalt concept;
- cannot measure in isolation;
- (many) experimentalists disagree.
\slideitem{Single-user vs team approaches:
- workload is dynamic;
- shared/distributed between a team;
- many prvious studies ignore this.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Workload
\slideitem{ How do we measure workload?
\slideitem{ Subjective ratings?
- NASA TLX, task load index;
- consider individual differences.
\slideitem{ Secondary tasks?
- performance on additional task;
- obtrusive & difficult to generalise.
\slideitem{ Physiological measures?
- heart rate, skin temperature etc;
- lots of data but hard to interpret.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Workload
\slideitem{ How to reduce workload?
\slideitem{ Function allocation?
- static or dynamic allocation;
- to crew, systems or others (ATC?).
\slideitem{ Automation?
- but it can increase workload;
Acknowledgement: C.D. Wickens and J.M. Flach, Information Processing. In E.L. Wiener and D.C. Nagel (eds.), Human Factors in Aviation, 111-156, Academic Press, London, 1988.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Situation Awareness
"Situation awareness is the perception of the elements of the environment within a volume of time and spcae, the comprehension of their meaning, and the projection of their status in the near future"
Acknowledgement: M. R. Endsley, Design and Evaluation for Situation Awareness Enhancement. In Proceedings of the Human Factors Society 32nd Annual Meeting, 97-101. Human Factors Society, Santa Monica, CA, 1988.
\slideitem{Rather abstract definition.
\slideitem{Most obvious when it is lost.
\slideitem{Difficult to explain behaviour:
- beware SA becoming a "catch all";
- just as "high workload" was.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Situation Awareness
Hint: use your browser to open this image
Acknowledgement: M.R. Endsley, Towards a Theory of Situation Awareness, Human Factors, (37)1:32-64, 1995.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Situation Awareness
\slideitem{Level 1: perception of environment
- how much can be attended to?
- clearly not everything...
\slideitem{Level 2: Comprehension of situation
- synthesise the elements at level 1;
- significance determined by goals.
\slideitem{Level 3: Projection of future.
- knowledge of status and dynamics;
- may only be possible in short term;
- enables strategy not just reaction.
\slideitem{Novice perceives everything at L1;
- but fails at levels 2 and 3.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Situation Awareness
Acknowledgement: D.G. Jones and M.R. Endsley, Sources of Situation Awareness Errors in Aviation. Aviation, Space and Evironmental Medecine, 67(6):507-512, 1996.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Situation Awareness
\slideitem{Hmm, subjective classification.
\slideitem{33 incidents with Air Traffic Control.
\slideitem{NASA (ASRS) reporting system:
- how typical are reported events?
\slideitem{I worry about group work:
- colleagues help you maintain SA?
- prompting, reminding, informing?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Situation Awareness
"Investigators were able to trace a series of errors that initiated with the flight crews acceptance of the controller's offer to land on runway 19.
The flightcrew expressed concern about possible delays and accepted an offer to expedite their approach into Cali...
One of the AA965 pilots selected a direct course to the Romeo NDB believing it was the Rozo NDB, and upon executing the selection in the FMS permitted a turn of the airplane towards Romeo, without having verified that it was the correct selection and without having first obtained approval of the other pilot, contrary to AA procedures...
The flightcrew had insufficient time to prepare for the approach to Runway 19."
American Airlines Flight 965
Boeing 757-223, N651AA
Near C\slideitem{ Colombia
December 20, 1995 Aeronautica Civil of the Republic of Colombia
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Crew Resource Management
"...Among the results were that captains of more effective crews (who made fewer operational or precedural errors) verbalised a greater number of plans than those of lower performing crews and requested and used more information in making their decisions.
This raises interesting questions about whether situation awareness can be improved by teaching specific communication skills or even proceduralising certain communications that would otherwise remain in the realm of unregulated CRM (crew resource management behaviour)."
Acknowledgement: S. Dekker and J. Orasanu, Automation and Situation Awareness. In S. Dekker and E. Hollnagel (eds.), Coping with Computers in the Cockpit. 69-85, Ashgate, Aldershot, 1999. ISBN-0-7546-1147-7.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Crew Resource Management
\slideitem{Cockpit Resource Management:
\slideitem{Cockpit Resource Management:
- crew coordination;
- decision making;
- situation awareness...
\slideitem{More review activities inserted
into standard operating procedures.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Crew Resource Management
Acknowledgement: C.A. Bowers, E.L. Blickensderfer and B.B. Morgan, Air Traffic Control Team Coordination. In M.W. Smolensky and E.S. Stein, Human Factors in Air Traffic Control, 215-237, Academic Press, London, 1998.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Crew Resource Management
\slideitem{Cockpit Resource Management:
- based on Foushee and Helmreich.
\slideitem{Group performance determined by:
- process variables - communication;
- input variables - group size/skill.
\slideitem{Goes against image of:
- pilot as "rugged individual";
- showing "the right stuff".
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Crew Resource Management
\slideitem{Key objectives...
\slideitem{ alter individual attitudes to groups;
\slideitem{ improve coordination within crew;
\slideitem{ increase team member effort;
\slideitem{ optimise team composition.
\slideitem{Can we change group norms?
\slideitem{Does it apply beyond aviation?
- with fewer rugged individuals?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Crew Resource Management
FAA Advisory Circular 120-51A 1993
\slideitem{ Briefings are interactive and emphasize the importance of questions, critique, and the offering of
information.
\slideitem{ Crew members speak up and state their information with appropriate persistence until there is
some clear resolution.
\slideitem{ Critique is accepted objectively and non-defensively.
\slideitem{ The effects of stress and fatigue on performance are recognised.
NASA /UT LOS Checklist
\slideitem{When conflicts arise, the crew remain focused on the problem or situation at hand. Crew
members listen actively to ideas and opinions and admit mistakes when wrong, conflict issues are
identified and resolved.
\slideitem{ Crew members verbalize and acknowledge entries to automated systems parameters.
\slideitem{ Cabin crew are included as part of team in briefings, as appropriate, and guidelines are
established for coordination between flight deck and cabin.
Human Factors Group Of The Royal Aeronautical Society.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Crew Resource Management
CRM TRAINING METHODS AND PROCESSES
Phase One - Awareness training - 2 days classroom (residential or
non-residential).
Objectives:
\slideitem{Knowledge:
\slideitem{ Relevance of CRM to flight safety and the efficient operation of an aircraft
\slideitem{ How CRM reduces stress and improves working environment
\slideitem{ Human information processing
\slideitem{ Theory of human error
\slideitem{ Physiological effects of stress and fatigue
\slideitem{ Visual & aural limitations
\slideitem{ Motivation
\slideitem{ Cultural differences
\slideitem{ CRM language and jargon.
\slideitem{ The CRM development process
\slideitem{ Roles such as leadership and followership
\slideitem{ Systems approach to safety and man machine interface and SHEL model
\slideitem{ Self awareness
\slideitem{ Personality types
\slideitem{ Evaluation of CRM
\slideitem{Skills:
\slideitem{Nil
\slideitem{Attitudes:
\slideitem{ Motivated to observe situations, others' and own behaviour in future.
\slideitem{ Belief in the value of developing CRM skills.
\slideitem{ Activities:
\slideitem{ Presentations
\slideitem{ Analysis of incidents and accidents by case study or video
\slideitem{ Discussion groups
\slideitem{ Self disclosure
\slideitem{ Personality profiling and processing
\slideitem{ Physiological experience exercises
\slideitem{ Self study
Human Factors Group Of The Royal Aeronautical Society.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Crew Resource Management
CRM TRAINING METHODS AND PROCESSES
Phase Two - Basic Skills training - 3/4 days classroom residential
Objectives:
\slideitem{ Knowledge:
\slideitem{ Perceptions
\slideitem{ How teams develop
\slideitem{ Problem solving & decision making processes
\slideitem{ Behaviours and their differences
\slideitem{ Thought processes
\slideitem{ Respect and individual rights
\slideitem{ Development of attitudes
\slideitem{ Communications toolkits
\slideitem{ Skills:
\slideitem{ See Appendix B
\slideitem{ Attitudes
\slideitem{ See Appendix B
\slideitem{ Activities:
\slideitem{ Presentations
\slideitem{ Experiential learning - (Recreating situations and experiences, using feelings to log in
learning, experimenting in safe environments with cause and effect behaviour exercises)
\slideitem{ Role play
\slideitem{ Videod exercises
\slideitem{ Team exercises
\slideitem{ Giving & receiving positive and negative criticism
\slideitem{ Counselling
\slideitem{ Case studies
\slideitem{ Discussion groups
\slideitem{ Social and leisure activities
Human Factors Group Of The Royal Aeronautical Society.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Crew Resource Management
CRM TRAINING METHODS AND PROCESSES
Classroom, CPT or simulator
Objectives:
\slideitem{ Development of knowledge, skills and attitudes to required competency standards.
\slideitem{ Activities:
Practicing one or more skills on a regular basis under instruction in either the classroom, mock up/
CPT facility or full simulator LOFT sessions. Also considered valuable would be coaching by
experienced crews during actual flying operations.
Human Factors Group Of The Royal Aeronautical Society.
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Crew Resource Management
"Under normal conditions, aircraft flying is not a very interdependent task. In many cases, pilots are able to fly their aircraft successfully with relatively little coordination with other crew members, and communication between crew members is rquired primarily during nonroutine situations."
Acknowledgement: C.A. Bowers, E.L. Blickensderfer and B.B. Morgan, Air Traffic Control Team Coordination. In M.W. Smolensky and E.S. Stein, Human Factors in Air Traffic Control, 215-237, Academic Press, London, 1998.
\slideitem{Does it work in abnormal events?
\slideitem{Additional requirements ignored?
\slideitem{Can it hinder performance?
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work
\slideitem{Workload.
\slideitem{Situation Awareness.
\slideitem{Crew Resource Management
\slideend{\it{\copyright C.W. Johnson, 1997 - Human Computer Interaction}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{document}