\input{/users/staff/johnson/teaching/hoskyns/slidedefs.tex}
\title{Safety Critical Systems Development}
 
\author{Prof. Chris Johnson,\\
Department of Computing Science,\\
University of Glasgow,\\
Glasgow,\\
Scotland.\\
G12 8QJ.\\ \\
URL: http://www.dcs.gla.ac.uk/$\sim$johnson\\
E-mail: johnson@dcs.glasgow.ac.uk\\
Telephone: +41 330 6053}
 
\date{October 1999.}
 
 
\begin{document}
\maketitle
 
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Developmen
t}.}
\pagehead{Terminology and the Arian 5 Case Study}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Introduction}

Safety Critical Systems Development


Hazard Analysis

\slideitem{ Hazard Analysis.
\slideitem{ FMECA/FMEA.
\slideitem{ Case Study.


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Hazard Analysis

\slideitem{ Safety case:
- why proposed system is safe.
\slideitem{ Must identify potential hazards.
\slideitem{ Assess liklihood and severity.


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}


Hazard Analysis

\slideitem{Lots of variant features:
- checklists...
- hazard indices...
\slideitem{ Lots of techniques:
- fault tress (see later);
- cause consequence analysis;
- HAZOPS;
- FMECA/FHA/FMEA...


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

FMECA - Failure Modes, Effect and Criticality Analysis

\slideitem{MIL STD 1629A (1977!).
\slideitem{Analyse each potential failure.
\slideitem{Determine impact of system(s).
\slideitem{Assess its criticality.


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

FMECA - Failure Modes, Effect and Criticality Analysis
1. Construct functional block diagram.
2. Use diagram to identify any associated failure modes.
3. Identify effects of failure and assess criticality.
4. Repeat 2 and 3 for potential consequences.
5. Identify causes and occurence rates.
6. Determine detection factors.
7. Calculate Risk Priority Numbers.
8. Finalise hazard assessment.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA - Step 1: Functional Block Diagram

\slideitem{Establish scope of the analysis.
\slideitem{Break system into subcomponents.
\slideitem{Different levels of detail?
\slideitem{Some unknowns early in design?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA - Step 1: Functional Block Diagram

<img width=450 src="../images/fmeca.gif">

Acknowledgement: taken from J.D. Andrews and T.R. Moss, Reliability and Risk
                Assessment, Longman, Harlow, 1993
                (ISBN-0-582-09615-4). 


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}


FMECA - Step 2: Identfy Failure Modes

\slideitem{ Many different failure modes:
- complete failure;
- partial failure;
- intermittant failure;
- gradual failure;
- etc.

\slideitem{Not all will apply?


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}


FMECA - Step 3: Assess Criticality
<table BORDER="1" CELLPADDING="3" width="90%">
<td WIDTH="90" V\slideitem{N="middle" HEIGHT="42">Hazardous without warning</td>
<td WIDTH="343" V\slideitem{N="middle" HEIGHT="42">
Very high severity ranking when a potential failure mode affects safe operation or involves non-compliance with a government regulation without warning.
</td>
<td WIDTH="43" V\slideitem{N="middle" HEIGHT="42" align="center"><p
\slideitem{N="center">10</td>
  </tr>
  <tr>
<td WIDTH="90" V\slideitem{N="middle" HEIGHT="42">Hazardous with warning</td>
    <td WIDTH="343" V\slideitem{N="middle" HEIGHT="42">Failure affects safe
product operation or involves noncompliance with government regulation with warning.</td>
    <td WIDTH="43" V\slideitem{N="middle" HEIGHT="42" align="center"><p
\slideitem{N="CENTER">9</td>
  </tr>
  <tr>
    <td WIDTH="90" V\slideitem{N="middle" HEIGHT="42">Very
High</td>
    <td WIDTH="343" V\slideitem{N="middle" HEIGHT="42">Product is inoperable
with loss of primary
    Function.</td>
    <td WIDTH="43" V\slideitem{N="middle" HEIGHT="42" align="center"><p
\slideitem{N="CENTER">8</td>
  </tr>
  <tr>
    <td WIDTH="90" V\slideitem{N="middle"
HEIGHT="42">High</td>
    <td WIDTH="343" V\slideitem{N="middle" HEIGHT="42">Product is operable, but
at reduced level of
    performance. </td>
    <td WIDTH="43" V\slideitem{N="middle" HEIGHT="42" align="center"><p
\slideitem{N="CENTER">7</td>
  </tr>
  <tr>
    <td WIDTH="90" V\slideitem{N="middle"
HEIGHT="42">Moderate</td>
    <td WIDTH="343" V\slideitem{N="middle" HEIGHT="42">Product is operable, but
comfort or
    convenience item(s) are inoperable.</td>
    <td WIDTH="43" V\slideitem{N="middle" HEIGHT="42" align="center"><p
\slideitem{N="CENTER">6</td>
  </tr>
  <tr>
    <td WIDTH="90" V\slideitem{N="middle" HEIGHT="42">Low</td>
 
    <td WIDTH="343" V\slideitem{N="middle" HEIGHT="42">Product is operable, but
comfort or
    convenience item(s) operate at a reduced level of performance. </td>
 
    <td WIDTH="43" V\slideitem{N="middle" HEIGHT="42" align="center"><p
\slideitem{N="CENTER">5</td>
  </tr>
  <tr>
    <td WIDTH="90" V\slideitem{N="middle" HEIGHT="42">Very
Low</td>
    <td WIDTH="343" V\slideitem{N="middle" HEIGHT="42">Fit &amp; finish or
squeak &amp; rattle item
    does not conform. Most customers notice defect.</td>
    <td WIDTH="43" V\slideitem{N="middle" HEIGHT="42" align="center"><p
\slideitem{N="CENTER">4</td>
  </tr>
  <tr>
    <td WIDTH="90" V\slideitem{N="middle"
HEIGHT="42">Minor</td>
    <td WIDTH="343" V\slideitem{N="middle" HEIGHT="42">Fit &amp; finish or
squeak &amp; rattle item
    does not conform. Average customers notice defect.</td>
    <td WIDTH="43" V\slideitem{N="middle" HEIGHT="42" align="center"><p
\slideitem{N="CENTER">3</td>
  </tr>
  <tr>
    <td WIDTH="90" V\slideitem{N="middle" HEIGHT="42">Very
Minor</td>
    <td WIDTH="343" V\slideitem{N="middle" HEIGHT="42">Fit &amp; finish or
squeak &amp; rattle item
    does not conform. Discriminating customers notice defect.</td>
    <td WIDTH="43" V\slideitem{N="middle" HEIGHT="42" align="center"><p
\slideitem{N="CENTER">2</td>
  </tr>
  <tr>
    <td WIDTH="90" V\slideitem{N="middle"
HEIGHT="42">None</td>
    <td WIDTH="343" V\slideitem{N="middle" HEIGHT="42">No effect</td>
    <td WIDTH="43" V\slideitem{N="middle" HEIGHT="42" align="center"><p
\slideitem{N="CENTER">1</td>

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA - Step 4: Repeat for potential consequences

\slideitem{ Can have knock-on effects.
\slideitem{Additional failure modes.
\slideitem{Or additional contexts of failure.
\slideitem{Iterate on the analysis.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA - Step 5: Identify Cause and Occurence Rates

\slideitem{Modes with most severe effects first.
\slideitem{What causes the failure mode?
\slideitem{How likely is that cause?
\slideitem{risk = frequency x cost

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

FMECA - Step 5: Identify Cause and Occurence Rates

<table BORDER="1" CELLSPACING="4" CELLPADDING="7" width="90%">
    <td WIDTH="195" V\slideitem{N="middle"><font face="Symbol"> 1 in 2</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">10</td>
  </tr>
  <tr>
    <td WIDTH="195" V\slideitem{N="middle">1 in 3</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">9</td>
  </tr>
  <tr>
    <td WIDTH="350" V\slideitem{N="middle" ROWSPAN="2"><b>High: Repeated failures</b></td>
    <td WIDTH="195" V\slideitem{N="middle">1 in 8</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">8</td>
  </tr>
  <tr>
    <td WIDTH="195" V\slideitem{N="middle">1 in 20</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">7</td>
  </tr>
  <tr>
    <td WIDTH="350" V\slideitem{N="middle" ROWSPAN="3"><b>Moderate: Occasional failures</b></td>
    <td WIDTH="195" V\slideitem{N="middle">1 in 80</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">6</td>
  </tr>
  <tr>
    <td WIDTH="195" V\slideitem{N="middle">1 in 400</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">5</td>
  </tr>
  <tr>
    <td WIDTH="195" V\slideitem{N="middle">1 in 2000</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">4</td>
  </tr>
  <tr>
    <td WIDTH="350" V\slideitem{N="middle" ROWSPAN="2"><b>Low: Relatively few failures</b></td>
    <td WIDTH="195" V\slideitem{N="middle">1 in 15,000</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">3</td>
  </tr>
  <tr>
    <td WIDTH="195" V\slideitem{N="middle">1 in 150,000</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">2</td>
  </tr>
  <tr>
    <td WIDTH="350" V\slideitem{N="middle"><b>Remote: Failure is unlikely</b></td>
    <td WIDTH="195" V\slideitem{N="middle">1 in 1,500,000</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">1</td>
 

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

FMECA - Step 6: Determine detection factors

Type (1):
These controls prevent the Cause or Failure Mode from
occurring, or reduce their rate of occurrence.

Type (2):
These controls detect the Cause of the Failure Mode and lead
to corrective action.

Type (3):
These Controls detect the Failure Mode before the product
operation, subsequent operations, or the end user.

\slideitem{Can we detect/control failure mode?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

FMECA - Step 6: Determine detection factors

    <td WIDTH="88" V\slideitem{N="middle" height="42"><i>Detection</i></font></td>
    <td WIDTH="449" V\slideitem{N="middle" height="42"><i>Criteria: Likelihood of Detection by Design Control</i></font></td>
    <td WIDTH="50" V\slideitem{N="middle" height="42" align="center"><p ALIGN="CENTER"><i>Rank</i></font></td>
  </tr>
  <tr>
    <td WIDTH="88" V\slideitem{N="middle"><b>Absolute Uncertainty</b></td>
    <td WIDTH="449" V\slideitem{N="middle">Design Control does not detect a potential Cause of
    failure or subsequent Failure Mode; or there is no Design Control</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">10</td>
  </tr>
  <tr>
    <td WIDTH="88" V\slideitem{N="middle"><b>Very Remote</b></td>
    <td WIDTH="449" V\slideitem{N="middle">Very remote chance the Design Control will detect a
    potential Cause of failure or subsequent Failure Mode</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">9</td>
  </tr>
  <tr>
    <td WIDTH="88" V\slideitem{N="middle"><b>Remote</b></td>
    <td WIDTH="449" V\slideitem{N="middle">Remote chance the Design Control will detect a potential
    Cause of failure or subsequent Failure Mode</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">8</td>
  </tr>
  <tr>
    <td WIDTH="88" V\slideitem{N="middle"><b>Very Low</b></td>
    <td WIDTH="449" V\slideitem{N="middle">Very low chance the Design Control will detect a potential
    Cause of failure or subsequent Failure Mode</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">7</td>
  </tr>
  <tr>
    <td WIDTH="88" V\slideitem{N="middle"><b>Low</b></td>
    <td WIDTH="449" V\slideitem{N="middle">Low chance the Design Control will detect a potential
    Cause of failure or subsequent Failure Mode</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">6</td>
  </tr>
  <tr>
    <td WIDTH="88" V\slideitem{N="middle"><b>Moderate</b></td>
    <td WIDTH="449" V\slideitem{N="middle">Moderate chance the Design Control will detect a potential
    Cause of failure or subsequent Failure Mode</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">5</td>
  </tr>
  <tr>
    <td WIDTH="88" V\slideitem{N="middle"><b>Moderately High</b></td>
    <td WIDTH="449" V\slideitem{N="middle">Moderately high chance the Design Control will detect a
    potential Cause of failure or subsequent Failure Mode</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">4</td>
  </tr>
  <tr>
    <td WIDTH="88" V\slideitem{N="middle"><b>High</b></td>
    <td WIDTH="449" V\slideitem{N="middle">High chance the Design Control will detect a potential
    Cause of failure or subsequent Failure Mode</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">3</td>
  </tr>
  <tr>
    <td WIDTH="88" V\slideitem{N="middle"><b>Very High</b></td>
    <td WIDTH="449" V\slideitem{N="middle">Very high chance the Design Control will detect a
    potential Cause of failure or subsequent Failure Mode</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">2</td>
  </tr>
  <tr>
    <td WIDTH="88" V\slideitem{N="middle"><b>Almost Certain</b></td>
    <td WIDTH="449" V\slideitem{N="middle">Design Control will almost certainly detect a potential
    Cause of failure or subsequent Failure Mode</td>
    <td WIDTH="50" V\slideitem{N="middle" align="center"><p ALIGN="CENTER">1</td>
  </tr>
</table>

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

FMECA - Step 7: Calculate Risk Priority Numbers
\slideitem{Risk Priority Numbers (RPN)
\slideitem{RPN = S x O x D, where
- S - severity index;
- O - occurence index;
- D - detection index.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

FMECA: Step 8 - Finalise Hazard Analysis

\slideitem{ Must document the analysis...
\slideitem{ ...and response to analysis.
\slideitem{Use FMECA forms.
\slideitem{Several formats and tools.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

FMECA: Step 8 - Finalise Hazard Analysis

<img width=450 src="../images/fmeca2.gif">


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA: Tools

<img width=450 src="../images/relex.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FMECA: Tools

<img width=450 src="../images/item_fmeca.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Conclusions

\slideitem{ Hazard analysi.
\slideitem{ FMECA/FMEA.
\slideitem{ Qualitative->quantitative approaches.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Probabilistic Risk Assessment (PRA)

     The use of PRA technology should be increased in all regulatory
matters to the extent supported by the state of the art in PRA
     methods and data and in a manner that complements the NRC's
deterministic approach and supports the NRC's traditional
     defense-in-depth philosophy.
 
     PRA and associated analyses (e.g., sensitivity studies, uncertainty
analyses, and importance measures) should be used in
     regulatory matters, where practical within the bounds of the state
of the art, to reduce unnecessary conservatism associated
     with current regulatory requirements, regulatory guides, license
commitments, and staff practices. 
 
<I>An Approach for Plant-Apecific, Risk-Informed Decisionmaking: Technical Specifications</I>

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Hazard Analysis vs PRA

\slideitem{ FMECA - hazard analysis.
\slideitem{ PRA part of hazard analysis.
\slideitem{ Wider links to decision theory.


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Decision Theory
\slideitem{ Risk = frequency x cost.
\slideitem{Which risk do we guard against?
DecisionA = (option_1; option_2;...; option_n)
DecisionB = (option_1; option_2;...;option_m)
Val(Decision) = 
sum^{i = 1 to limit} utility(option_n) x freq(option_n)
\slideitem{Are decision makers rational?
\slideitem{Can you trust the numbers?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

PRA - Meta-Issues

\slideitem{ Decision theory counter intuitive?
\slideitem{But just a formalisation of FMECA?
\slideitem{What is the scope of this approach?
- hardware failure rates (here)?
- human error rates (here)?
- software failure rates?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA 

<img width=470 src="../images/pra1.gif">
Acknowledgement: J. D. Andrews and T.R. Moss, Reliability and Risk Assessment, Longman, New York, 1993.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA
\slideitem{ Failure rate assumed to be constant.
\slideitem{Electronic systems approximate this.
\slideitem{Mechanical systems:
- bed-down failure rates;
- degrade failure rates;

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

PRA  - Mean Time To Failure

\slideitem{ MTTF:
reciprocal of constant failure rate.
MTTF =  1 / lambda.
lambda - base failure rate 
\slideitem{ 0.2 failures per hour:
\slideitem{See Andrews and Moss for proof.


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

PRA - Or Put Another Way...

Probability that product will work for T without failure:
     R(T) = exp(-T/MTTF)

\slideitem{ If MTTF = 250,000 hours.
\slideitem{ Over life of 10 years (87,600 hours).
\slideitem{ R = exp(-87,600/250000) = 0.70441
\slideitem{ 70.4% prob of no failure in 10 years.
\slideitem{ 70.4% of systems working in 10 year.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

PRA 

\slideitem{ For each failure mode.
Criticality_m = a x b x lambda_p x time
lambda_p - base failure rate with environmental/stress data
a - proportion of total failures in specified failure mode m
b - conditional prob. that expected failure effect will result

\slideitem{ If no failure data use:


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA - Sources of Data

\slideitem{ MIL-HDBK-217: 
Reliability Prediction of Electronic Equipment
\slideitem{Failure rate models for:
 - ICs, transistors, diodes, resistors,
 - relays, switches, connectors etc.

\slideitem{ Field data + simplifying assumptions.
\slideitem{Latest version F being revised.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

PRA 

\slideitem{ 217 too pessimistic for companies...
\slideitem{Bellcore (Telcordia):
- reliability prediction procedure..

During 1997, AT&T's Defects-Per-Million performance was
173, which means that of every one million calls placed on the
to a network failure. That equals a network reliability rate of
99.98 percent for 1997. 
\slideitem{ Business critical not safety critical

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA 


\slideitem{But MTTF doesnt consider repair!
\slideitem{MTTR considers observations.


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

PRA and FMECA Mode Probability

\slideitem{FMECA: 
- we used subjective criticality;
- however, MIL-338B calculates it;
- no. of failures per hour per mode.
\slideitem{CR = alpha x beta x lamda:
CR - criticality level,
alpha - failure mode frequency ratio,
beta - loss prob. of item from mode
lambda - base failure rate for item.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

PRA and FMECA Mode Probability


<img height=370 src="../images/mode_reliability.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

PRA 

\slideitem{We focussed on hardware devices.
\slideitem{PRA for human reliability?
\slideitem{Probably not a good idea.
\slideitem{But for completeness...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Technique for Human Error Rate Prediction (THERP)

``The THERP approach uses conventional reliability technology modified to account for greater variability and independence of human performance as compared with that of equipment performance...   The procedures of THERP are similar to those employed in conventional reliability analysis, except that human task activities are substituted for equipment outputs.'' 
(Miller and Swain, 1987 - cited by Hollnagel, 1998).

A.D. Swain and H.E. Guttman, 
Handbook of Human Reliability with Emphasis on Nuclear Power Plant Applications
NUREG-CR-1278, 1985.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Technique for Human Error Rate Prediction (THERP)

\slideitem{Pe =
He * Sum^{k=1 to n} Psf_k * W_k + C
\slideitem{ Where:
Pe - probability of error;
He - raw human error probability;
C - numerical constant;
Psf_k - performance shaping factor;
W_k - weight associated with PSF_k;
n - total number of PSFs.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Technique for Human Error Rate Prediction (THERP)

\slideitem{"Psychological vaccuous" (Hollnagel).
\slideitem{No model of cognition etc.
\slideitem{Calculate effect of PSF on HEP
- ignores WHY they affect performance.
\slideitem{Succeeds or fails on PSFs.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

THERP - External PSFs

<img  width=430 SRC = "../images/swain1.gif">
Acknowledgement: A.D. Swain, <I>Comparative Evaluation of Methods for Human Reliability Analysis</I>, (GRS-71), Garching FRG: Gesellschaft fur Reaktorsicherheit.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

THERP - Stressor PSFs

<img  width=430 SRC = "../images/swain2.gif">
Acknowledgement: A.D. Swain, <I>Comparative Evaluation of Methods for Human Reliability Analysis</I>, (GRS-71), Garching FRG: Gesellschaft fur Reaktorsicherheit.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

THERP - Internal PSFs

<img  SRC = "../images/swain3.gif">
<I>Hint: use yor browser to open this image.</I>
Acknowledgement: A.D. Swain, <I>Comparative Evaluation of Methods for Human Reliability Analysis</I>, (GRS-71), Garching FRG: Gesellschaft fur Reaktorsicherheit.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

CREAM

E. Hollnagel,
Cognitive Reliability and Error Analysis Method,
Elsevier, Holland, 1998.
\slideitem{HRA + theoretical basis.
\slideitem{Simple model of control:
- scrambled - unpredictable actions;
- opportunistic - react dont plan;
- tactical - procedures and rules;
- strategic - consider full context.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

CREAM - Simple Model of Control

<img  height=340 SRC = "../images/cream.gif">
<I>Hint: use yor browser to open this image.</I>
Acknowledgement: 
E. Hollnagel,
Cognitive Reliability and Error Analysis Method,
Elsevier, Holland, 1998.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

CREAM - Simple Model of Control

<img  height=340 SRC = "../images/cream2.gif">
Acknowledgement: 
E. Hollnagel,
Cognitive Reliability and Error Analysis Method,
Elsevier, Holland, 1998.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

CREAM 

\slideitem{Much more to the technique...
\slideitem{But in the end:
Strategic = 0.000005 < p < 0.01
Tactic = 0.001< p < 0.1
Opportunistic = 0.01 < p < 0.5
Scrambled = 0.1 < p < 1.0 
\slideitem{Common performance conditions to 
- probable control mode then to
- reliability estimate from literature.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Conclusions 

\slideitem{PRA for hardware:
- widely accepted with good data;
\slideitem{PRA for human performance:
- many are skeptical;
- THERP -> CREAM ->
\slideitem{PRA for software?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA and Fault Tree Analysis

\slideitem{Fault Trees (recap)
\slideitem{Software Fault Trees.
\slideitem{Software PRA.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Trees (Recap)

<img width=450 src="../images/basic_ft.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis

\slideitem{Each tree considers 1 failure.
\slideitem{Carefully choose top event.
\slideitem{Carefully choose system boundaries.
\slideitem{Assign probabilities to basic events.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis
\slideitem{Assign probabilities to basic events.
\slideitem{Stop if you have the data.
\slideitem{Circles denote basic events.
\slideitem{Even so, tool support is critical.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis

<img  width=470 SRC = "../images/fta1.gif">

\slideitem{Usually applied to hardware...
\slideitem{Can be used for software (later).

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis
<img  width=470 SRC = "../images/fta2.gif">
\slideitem{House events; "switch" true or false.
\slideitem{OR gates - multiple fault paths.


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Fault Tree Analysis

<img  width=470 SRC = "../images/fta3.gif">

\slideitem{Probabilistic inhibit gates.
\slideitem{Used with Monte Carlo techniques
- True if random number < prob.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Fault Tree Analysis

<img  height=270 SRC = "../images/john_fta1.gif">
\slideitem{Usually applied to hardware...


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Fault Tree Analysis

<img  height=310 SRC = "../images/john_ft2.gif">
Acknowledgement: J.D. Andrews and T.R. Moss, Reliability and Risk Assessment, Longman Scientific and Technical, Harlow, 1993. 

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Fault Tree Analysis - Cut Sets

\slideitem{Each failure has several modes
- `different routes to top event'.
\slideitem{Cut set:
basic events that lead to top event.
\slideitem{Minimal cut set:
removing a basic event avoids failure.
\slideitem{Path set:
basic events that avoid top event;
list of components that ensure safety.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Fault Tree Analysis - Cut Sets

<img  SRC = "../images/cut_set.gif">

\slideitem{Top_Event = K1 + K2 + ... K_n 
K_i minimal cut sets, + is logical OR. 
\slideitem{K_i = X_1 . X_2 . X_n
MCS are conjuncts of basic events.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Fault Tree Analysis - Cut Sets

\slideitem{Top-down approach:
- replace event by expression below;
- simply if possible (C.C = C).
\slideitem{ Can use Karnaugh map techniques;
- cf logic circuit design;
- recruit tool support in practice.
\slideitem{Notice there is no negation.
\slideitem{Notice there is no XOR.


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Fault Tree Analysis - MOCUS Cut Set Algorithm
1. Assign unique label to each gate.
2. Label each basic event.
3. Create a two dimensional array A.
4. Initialise A(1,1) to top event.
5. Scan array to find an OR/AND gate:<br>
If current position in A is OR gate...
- replace current position with a column;
- put gate's input events in new row of that column.
- replace current position with a row;
- put gate's input events in new column of that row.
6. Repeat 5 until no gates remain in array.
7. Remove any non-minimal cut sets.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Fault Tree Analysis - MOCUS

<img width=470  SRC = "../images/mocus.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Fault Tree Analysis - Probabilistic Analysis

<img  SRC = "../images/fault_prob.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Fault Tree Analysis - Probabilistic Analysis

\slideitem{Beware: independence assumption.

"If the same event occurs multiple times/places in a tree, any quantitative calculation must correctly reduce the boolean equation to account for these multiple occurrences.
Independence merely means that the event is not caused due to the failure of another event or component, which then moves into the realm of conditional probabilities."

\slideitem{Inclusion-exclusion expansion (Andrews & Moss).

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Fault Trees 

\slideitem{As you'd expect.
\slideitem{Starts with top-level failure
\slideitem{Trace events leading to failure.
\slideitem{But:
dont use probabilistic assessments;
\slideitem{If you find software fault path REMOVE IT!

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Fault Trees 

Leveson, N.G., Cha, S.S., Shimeall, T.J. ``Safety Verification of Ada Programs using Software Fault Trees,'' IEEE Software, July 1991.

\slideitem{Backwards reasoning.
\slideitem{Weakest pre-condition approach.
\slideitem{Similar to theorem proving.
\slideitem{Uses language dependent templates.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Fault Trees 

<img  SRC = "../images/sft_assign.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Fault Trees 

<img  width=470 SRC = "../images/software_ft.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Fault Trees 

<img  width=470 SRC = "../images/sft_while.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Fault Trees 

<img  SRC = "../images/sft_call.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Fault Trees 

<img  height=320 SRC = "../images/sft_example.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Fault Trees 

<img  SRC = "../images/sft_ada83.gif">
\slideitem{Exception template for Ada83.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Fault Trees 

<img  width=350 SRC = "../images/sft_ada95.gif">
See: S.-Y. Min, Y-K. Jang, A-D Cha, Y-R Kwon and D.-H. Bae, <I>Safety Verification of Ada95 Programs Using Software Fault Trees.</I>   In M. Felici, K. Kanoun and A. Pasquini (eds.) Computer Safety, Reliability and Security, Springer Verlag, LNCS 1698, 1999.

\slideitem{Exception template for Ada95.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
PRA for Software 

\slideitem{John Musa's work at Bell Labs.
\slideitem{Failure rate of software before tests.
\slideitem{Faults per unot of time (lambda_0):
- function of faults over infinite time.
\slideitem{ Based on execution time: 
- not calendar time as in hardware;
- so no overall system predictions.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Musa's PRA for Software 

lambda_0 = K x P x W_0

<td width=100> Symbol </td>
<td width=220> Represents </td>
<td width=100> Sample value </td>

k </td>
Constant that accounts for the dynamic
structure of the program and the varying
machines

k = 4.2E-7

p 
Estimate of the number of executions per
time unit
p = r/SLOC/ER

r 
Average instruction execution rate,
determined from the manufacturer or
benchmarking
Constant

SLOC 
Source lines of code (not including reused code).
.

ER 
Expansion ratio, a constant dependent upon
programming language
Assembler, 1.0; Macro Assembler, 1.5;
C, 2.5; COBAL, FORTRAN, 3; Ada,
4.5

W_0 
Estimate of the initial number of faults in
the program
Can be calculated using: w0 = N x B
or a default of 6 faults/1000 SLOC can
be assumed

N 
Total number of inherent faults 
Estimated based upon judgment or past
experience

B 
Fault to failure conversion rate; proportion
of faults that become failures. Proportion
of faults not corrected before the product is
delivered.
Assume B = .95; i.e., 95% of the faults
undetected at delivery become failures
after delivery


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

PRA for Software 

\slideitem{Considerable debate about this.
\slideitem{Many variants on the theme.
\slideitem{Metrics are crude...
\slideitem{In meantime, be skeptical  


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
FTA - Conclusions

\slideitem{Fault Trees:
- cut sets, cut paths;
- quantitative analysis.
\slideitem{Software Fault Trees:
- language dependent templates;
- if you see faults, remove them!
\slideitem{Software PRA.


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Safety-Critical Software
\slideitem{Why is software different?
\slideitem{ Software requirements:
- Leveson's completeness criteria.
\slideitem{ Software design (summary): 
MIL-338B preliminary design; 
MIL-338B detailed design. 

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Why is Software Different?

\slideitem{ Software is an abstract concept in that it is a set of instructions on a piece of paper or in computer
      memory. It can be torn apart and analyzed in piece parts like hardware, yet unlike hardware it is not a
      physical entity with physical characteristics which must comply with the laws of nature (i.e., physics
      and chemistry). 
\slideitem{ Since software is not a physical entity it does not wear out or degrade over time. This means that
      software does not have any failure modes per se. Once developed it always works the same without
      variation 
\slideitem{ Unlike hardware, once a software program is developed it can be duplicated or manufactured into many
      copies without any manufacturing variations. 
\slideitem{ Software is much easier to change than is hardware. For this reason many system fixes are made by
      modifying the software rather than the hardware. 
\slideitem{ There are no standard parts in software as there are with hardware. Therefore there are no high
      reliability software modules, and no industry alerts on poor quality software items. 
\slideitem{ If software has anything which even resembles a failure mode, it is in the area of hardware induced
      failures. 
\slideitem{ Hardware reliability prediction is based upon random failures, whereas software reliability prediction is
      based upon the theory that predestined errors exist in the software program. 
\slideitem{ Hardware reliability modeling is well established, however, there is no uniform, accurate or practical
      approach to predicting and measuring software reliability. 
\slideitem{ Since software does not have any failure modes, a software problem is referred to as a software error.
      A software error is defined as a situation when the software does not perform to specifications or as
      reasonably expected, that is when it performs unintended functions. This definition is fairly consistent
      with that of a hardware failure, except that the mechanisms or causes of failure are very different. 
\slideitem{ Hardware primarily fails due to physical or chemical mechanisms and seldom fails due to human failure
      mechanisms (e.g., documentation errors, coding errors, specification oversights), whereas just the
      opposite is true with software. 
\slideitem{ Software has many more failure paths than hardware, making it difficult to test all paths. 
\slideitem{ By itself software can do nothing and is not hazardous. Software must be combined with hardware in
      order to do anything.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Defects (Initial Views)

A software defect is either a fault or
discrepancy between code
and documentation that compromises testing or produces adverse effects
in installation,
modification, maintenance, or testing.
\slideitem{ Requirements Defects: Failure of software requirements to specify
the
environment in which the software will be used, or
requirements documentation that does not reflect the
design of the system in which the software will be
employed.
\slideitem{ Design Defects: Failure of designs to satisfy requirements, or
failure of
design documentation to correctly describe the design.
\slideitem{ Code Defects: Failure of code to conform to software designs.
Robert Dunn, Software Defect Removal, McGraw-Hill, 1984.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Defects (Initial Views)

A software fault that causes a deviation from the required output by
more than a specified
tolerance. Moreover, the software need produce correct outputs only for
inputs within the limits
that have been specified. It needs to produce correct outputs only
within a specified exposure
period. Since these definitions differ, a count of the number of
defects will yield different
results, and, hence, a different defect rate, depending on the counters
definition.
\slideitem{ Requirements Defects 
\slideitem{ Design Defects 
\slideitem{ Algorithmic Defects 
\slideitem{ Interface Defects
\slideitem{ Performance Defects
\slideitem{ Documentation Defects
Lawrence Putnam and Ware Myers, Measures for Excellence:
Reliable Software on
Time, Within Budget, Prentice-Hill, 1992.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Hazard Analysis

\slideitem{Already seen software fault trees.

\slideitem{ Trace identified software hazards to the software-hardware interface.
Translate the identified software related hazards into requirements and constraints on software behaviour.
\slideitem{ Show the consistency of the software safety constraints with the software requirements specification.   
Demonstrate the completeness of the software requirements, including the human-computer interface requirements, with respect to system safety properties.
Acknowledgement: Nancy Leveson, Safeware: System Safety and Computers, Addison Wesley, Reading Massachusetts, 1995.

\slideitem{Point 2 links to safety case slides?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Requirements Analysis

\slideitem{Leveson identifies 3 components.
\slideitem{Basic function or objective.
\slideitem{Constraints on operating conditions.
\slideitem{Prioritised quality goals;
- to help make tradeoff decisions.
\slideitem{Same as general hazard analysis?


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Kernel Requirements and Intent Specifications

\slideitem{ Kernel or core set of requirements.
\slideitem{Determined by current knowledge of:
- intended application functionality;
- environment & constraints.
\slideitem{Analytically independent.
\slideitem{Only know they are complete if 
- we know specification intent...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Leveson's Completeness Criteria

<img  height=300 SRC = "../images/black_box.gif" >

\slideitem{Remember - `Black Box' architecture.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Leveson's Completeness Criteria

\slideitem{Human Computer Interface Criteria.
\slideitem{State Completeness.
\slideitem{Input/Output Variable Completeness.
\slideitem{Trigger Event Completeness.
\slideitem{Output Specification Completeness.
\slideitem{Output to Trigger Relationships.
\slideitem{State Transitions.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Leveson's Completeness Criteria

\slideitem{Human Computer Interface Criteria.
\slideitem{Criteria depend on task context. 
\slideitem{Eg in monitoring situation:
- what must be observed/displayed?
- how often is it sampled/updated?
- what is message priority?
\slideitem{Not just when to present but also
- when to remove information...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Leveson's Completeness Criteria

\slideitem{State Completeness Criteria.
\slideitem{Consider input effect when state is:
- normal, abnormal, indeterminate.
\slideitem{Start-up, close-down are concerns. 
\slideitem{Process will change even during
- intervals in which software is `idle'.
\slideitem{Checkpoints, timeouts etc.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Leveson's Completeness Criteria

\slideitem{Input/Output Variable Completeness.
\slideitem{Input from sensors to software.
\slideitem{Output from software to actuators.
\slideitem{Specification may be incomplete if:
- sensor isnt refered to in spec;
- legal value isnt used in spec.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Leveson's Completeness Criteria

\slideitem{Trigger Event Completeness.

Robustness:
every state has a transition defined for every possible input.
Non-determinism:
only 1 transition is possible from a state for each input.
Value and Timing assumptions:
- what triggers can be produced from the environment?
- what ranges must trigger variables fall within?
- what are the real-time requirements...
- specify bounds for responses to input (timeouts)

\slideitem{And much, much more....

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Leveson's Completeness Criteria

\slideitem{Output Specification Completeness.
- from software to process actuators.
\slideitem{Check for hazardous values.
\slideitem{Check for hazardous timings;
- how fast do actuators take events?
- what if this rate is exceeded?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Leveson's Completeness Criteria

\slideitem{Output to Trigger Relationships. 
\slideitem{Links between input & output events.
\slideitem{For any output to actuators:
- can effect on process be detected?
- if output fails can this be seen?
\slideitem{ What if response is:
- missing, too early or too late?
\slideitem{If response recieved without trigger
- then erroneous state.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Leveson's Completeness Criteria

\slideitem{State Transitions.

Reachability:
all specified states can be reached from initial state.

Recurrent behaviour:
desired recurrent behaviour must execute for at least one cycle and be bounded by exit condition.

Reversibility:
output commands should wherever possible be reversible and those which are not must be carefully controlled. <br>

Preemption:
all possible preemption events must be considered for any non-atomic transactions.

\slideitem{Again more complexity here...


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Reality Check...

<img  height=300 SRC = "../images/software_reliability.gif" >

\slideitem{Completeness criteria change.  
\slideitem{Environment and functions change.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

From Requirements to Design

Once the requirements have been detailed and accepted, the design will
process of allocating and arranging the functions of the system so that
the aggregate meets all
customer needs. Since several different designs may meet the
requirements, alternatives must be
assessed based on technical risks, costs, schedule, and other
considerations. A design developed
before there is a clear and concise analysis of the systems objectives
can result in a product that
does not satisfy the requirements of its customers and users. In
addition, an inferior design can
make it very difficult for those who must later code, test, or maintain
the software. During the
course of a software development effort, analysts may offer and explore
many possible design
alternatives before choosing the best design.

US Department of Defence: Electronic Reliability Design Handbook

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Preliminary Design
Preliminary or high-level design is the phase of a software project in
which the major software
system alternatives, functions, and requirements are analyzed. From the
alternatives, the
software system architecture is chosen and all primary functions of the
system are allocated to the
computer hardware, to the software, or to the portions of the system
that will continue to be
accomplished manually.

US Department of Defence: Electronic Reliability Design Handbook

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Preliminary Design

\slideitem{Develop the architecture:
\slideitem{ system architecture - an overall view of system components
\slideitem{ hardware architecture - the systems hardware components and their
interrelations
\slideitem{ software architecture - the systems software components and their
interrelations

\slideitem{ Investigate and analyze the physical alternatives for the system and
choose solutions

\slideitem{ Define the external characteristics of the system

\slideitem{ Refine the internal structure of the system by decomposing the
high-level software
architecture

\slideitem{ Develop a logical view or model of the systems data
 
US Department of Defence: Electronic Reliability Design Handbook

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Detailed Design

Detailed design or low-level design determines the specific steps
required for each component or
process of a software system. Responsibility for detailed design may
belong to either the system
designers (as a continuation of preliminary design activities) or to the
system programmers.
Information needed to begin detailed design includes: the software
system requirements, the
system models, the data models, and previously determined functional
decompositions. The
specific design details developed during the detailed design period are
categories: for the system as a whole (system specifics), for individual
processes within the
system (process specifics), and for the data within the system (data
specifics). 

US Department of Defence: Electronic Reliability Design Handbook

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Detailed Design (Example concerns)

System specifics:
\slideitem{ Physical file system structure
\slideitem{ Interconnection records or protocols between software and hardware
components
\slideitem{ Packaging of units as functions, modules or subroutines
\slideitem{ Interconnections among software functions and processes
\slideitem{ Control processing
\slideitem{ Memory addressing and allocation
\slideitem{ Structure of compilation units and load modules
 
Process specifics:
\slideitem{ Required algorithmic details
\slideitem{ Procedural process logic
\slideitem{ Function and subroutine calls
\slideitem{ Error and exception handling logic

Data specifics:
\slideitem{ Global data handling and access
\slideitem{ Physical database structure
\slideitem{ Internal record layouts
\slideitem{ Data translation tables
\slideitem{ Data edit rules
\slideitem{ Data storage needs

US Department of Defence: Electronic Reliability Design Handbook


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Don't Forget the Impact of Standards 


UK Defense software standard

Sean Matthews <sean@aipna.edinburgh.ac.uk> 
Fri, 30 Jun 89 13:49:12 BST


I have just seen a copy of the UK department of defence draft standard
for safety critical software  (00-55).

Here are a few high (and low) points.

1. There should be no dynamic memory allocation (This rules out explicit
recursion - though a bounded stack is allowed).

2. There should be no interupts except for a regular clock interupt.

3. There should not be any distributed processing (i.e. only a single
processor).

4. There should not be any multiprocessing.

5. NO ASSEMBLER.

6. All code should be at least rigourously checked using mathematical
methods.

7. Any formally verified code should have the proof submitted as well, in
machine readable form, so that an independent check can be performed.

8. All code will be formally specified.

9. There are very strict requirements for static analysis (no unreachable
code, no unused variables, no unintialised variables etc.).

10. No optimising compilers will be used.

11. A language with a formally defined syntax and a well defined semantics,
or a suitable subset thereof will be used.

Comments.

1. means that all storage can be statically allocated.  In fact somewhere it
says that this should be the case.

2-4 seem to leave no option but polling.  This is impractical, especially in
embedded systems.  No one is going to build a fly by wire system with those
sorts of restrictions. (maybe people should therefore not build fly by wire
systems, but that is another matter that has been discussed at length here
already).  it also ignores the fact that there are proof methods for dealing
with distributed systems.

5. This is interesting, I seem to remember reading somewhere that Nasa used
to have the opposite rule: no high level languages, since they actually read
the delivered binary to check that the software did what it was supposed to
do.

methods' is *invoked* in a general way without going into very much detail
about what is involved.  I am not sure that the people who wrote the report
were sure (Could someone from Praxis - which I believe consulted on drawing
it up - enlarge on this?).

8. this is an excellent thing, though it does not say what sort of language
should be used.  Is a description in terms of a Turing machine suitable?
After all that is a well understood formal system.

10. Interestingly, there is no requirement that the compiler be formally
verified, just that it should conform to international standards (though
strictly), and not have any gross hacks (i.e. optimisation) installed.
There is also no demand that the target processor hardware be verified
(though such a device exists here already: the Royal Signals Research
Establishment's Viper processor).

11. seems to be a dig at Ada and the no subsets rule.  It also rules out C.

Conclusions.

I find the idea of the wholesale mayhem and killing merchants being forced
to try so much harder to ensure that their products maim and kill only the
people they are supposed to maim and kill, rather amusing.

The standard seems to be naive in its expectations of what can be achieved
at the moment with formal methods (That is apparently the general opinion
around here, and there is a *lot* of active research in program verification
in Edinburgh), and impossibly restrictive.

An interesting move in the right direction but too fast and too soon.  And
they might blow the idea of Formal verification by tring to force it too
soon.  And I would very much like to see these ideas trickle down into the
civil sector.

I might follow this up with a larger (and more coherent) description if
there is interest (this was typed from memory after seeing it yesterday)
there is quite a bit more in it.

Sean Matthews
Dept. of Artificial Intelligence JANET: sean@uk.ac.ed.aipna
University of Edinburgh           ARPA: sean%uk.ac.ed.aipna@nsfnet-relay.ac.uk
80 South Bridge                   UUCP: ...!mcvax!ukc!aipna!sean
Edinburgh, EH1 1HN, Scotland


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Conclusion
\slideitem{Why is software different?
\slideitem{ Software requirements:
- Leveson's completeness criteria.
\slideitem{ Software design (summary):
MIL-338B preliminary design; 
MIL-338B detailed design. 

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Safety-Critical Software Development
\slideitem{Software design by:
- hazard elimination;
- hazard reduction;
- hazard control.
\slideitem{Software implementation issues:
- dangerous practices;
- choice of `safe' languages.
\slideitem{The DO-178B Case Study.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Leveson's Taxonomy of Design Techniques
\slideitem{Hazard elimination/avoidance
\slideitem{Hazard reduction (see 4?)
\slideitem{Hazard control
\slideitem{Hazard minimization (see 2?)

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Design and Hazard Elimination
\slideitem{Substitution
hardware interlocks before software.
\slideitem{Simplification
new software features add complexity.
\slideitem{Decoupling 
computers add common failure point.
\slideitem{Human Error `Removal' 
readability of instruments etc.
\slideitem{Removal of hazardous materials 
eliminate UNUSED code (Ariane 5).

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Hazard Elimination: Datalink Example
<img  width=480 SRC = "../images/ATC.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
DO-178B - NASA GCS Case Study
\slideitem{Project compared:
- faults found in statistical tests;
- faults found in 178B development.
\slideitem{Main conclusions:
- such comparisons very difficult;
- DO-178B hard to implement;
- lack of materials/examples.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Development: DO-178B Practitioners' View

       The difficulties that have been identified are the DO-178 requirements for evidence and
       rigorous verification... 
       Systematic records of accomplishing each of the
       objectives and guidance are necessary. A documentation trail must exist
       demonstrating that the development processes not only were carried out, but also
       were corrected and updated as necessary during the program life cycle. Each
       document, review, analysis, and test must have evidence of critique for accuracy and
       completeness, with criteria that establishes consistency and expected results. This is
       usually accomplished by a checklist which is archived as part of the program
       certification records. The degree of this evidence varies only by the safety criticality of
       the system and its software. 
Original source on http://stsc.hill.af.mil/crosstalk/1998/oct/schad.asp


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Development: DO-178B Practitioners' View

...Engineering has not been schooled or trained to
                meticulously keep proof of the processes, product, and
                verification real-time. The engineers have focused on the
                development of the product, not the delivery. In addition,
                program durations can be from 10 to 15 years resulting in
                the software engineers moving on by the time of system
                delivery. This means that most management and engineers
have never been on a project from "cradle-to-grave." 
Original source on http://stsc.hill.af.mil/crosstalk/1998/oct/schad.asp

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Development: DO-178B Practitioners' Views 

The weakness of commercial practice with DO-178B is the lack of consistent,
       comprehensive training of the FAA engineers/DERs/foreign agencies affecti
ng:
 
\slideitem{ the effectiveness of the individual(s) making findings; and,
\slideitem{ the consistency of the interpretations in the findings.
Training programs may be the answer for both the military and commercial
environments to avoid the problem of inconsistent interpretation and the results
 of
literal interpretation.

Original source on http://stsc.hill.af.mil/crosstalk/1998/oct/schad.asp

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Safety-Critical Software Development - Conclusions
\slideitem{Software design by:
- hazard elimination;
- hazard reduction;
- hazard control.
\slideitem{Software implementation issues:
- dangerous practices;
- choice of `safe' languages.
\slideitem{The DO-178B Case Study.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Design and Hazard Reduction

\slideitem{Design for control:
- incremental control;
- intermediate states;
- decision aids;
- monitoring.
\slideitem{Add barriers:
- hard/software locks;
\slideitem{Minimise single point failures: 
- increase safety margins;
- exploit redundancy;
- allow for recovery.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Hazard Reduction: Interlock Example

This heavy duty solenoid controlled tongue switch controls access to hazardous
   machines with rundown times.

   Olympus withstands the arduous environments
   associated with the frequent operation of heavy
   duty access guards. The unit also self adjusts to
   tolerate a high degree of guard misalignment.
   The stainless steel tongue actuator is
   self-locking and can only be released after the
   solenoid receives a signal from the machine
   control circuit. This ensures that the machine has
   completed it's cycle and come to rest before the
   tongue can be disengaged and machine access
   obtained.


Software Design and Hazard Control

\slideitem{Limit exposure.
back to `normal' fast (exceptions).
\slideitem{Isolate and contain.
dont let things get worse...
\slideitem{Fail-safe. 
panic shut-downs, watchdog code.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Hazard Control: Watchdog Example

\slideitem{Hardware or software (beware).
\slideitem{Check for processor activity:
- 1. load value into a timer;
- 2. decrement timer every interval;
- 3. if value is zero then reboot.
\slideitem{Processor performs 1 at a frequency
- great enough to stop 3 being true;
- unless it has crashed.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Design Techniques: Fault Tolerance

\slideitem{Avoid common mode failures.
\slideitem{Need for design diversity.
\slideitem{Same requirements:
- different programmers?
- different contractors?
- homogenous parallel redundancy?
- microcomputer vs PLC solutions?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Design Techniques: Fault Tolerance

\slideitem{Redundant hardware may duplicate
- any faults if software is the same.
\slideitem{N-version programming:
- shared requirements;
- different implementations;
- voting ensures agreement.
\slideitem{ What about timing differences?
- comparison of "continuous" values?
- what if requirements wrong?
- costs make N>2 very uncommon;
- performance costs of voting.
\slideitem{A340 primary flight controls.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Design Techniques: Fault Tolerance

\slideitem{Exception handling mechanisms.
\slideitem{Use run-time system to detect faults:
- raise an exception;
- pass control to appropriate handler;
- could be on another processor.
\slideitem{Propagate to outmost scope then fail.
\slideitem{Ada...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Design Techniques: Fault Tolerance

\slideitem{Recovery blocks:
- write acceptance tests for modules;
- if it fails then execute alternative.
\slideitem{Must be able to restore the state:
- take a snapshot/checkpoint;
- if failure restore snapshot.
\slideitem{But:
- if failed module have side-effects?
- eg effects on equip under control?
- recovery block will be complicated.
\slideitem{Different from execptions:
- dont rely on run-time system.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Design Techniques: Fault Tolerance

\slideitem{Control redundancy includes:
- N-version programming;
- recovery blocks;
- exception handling.
\slideitem{But data redundancy uses extra data
- to check the validity of results.
\slideitem{Error correcting/detecting codes.
\slideitem{Checksum agreements etc.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Implementation Issues

\slideitem{Restrict language subsets.
\slideitem{Alsys CSMART Ada kernel etc.
\slideitem{Or just avoid high level languages?
\slideitem{No task scheduler - bare machine.
\slideitem{Less scheduling/protection risks
- more maintenance risks;
- less isolation (no modularity?).

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Implementation Issues

\slideitem{Memory jumps:
- control jumps to arbitrary location?
\slideitem{Overwrites:
- arbitrary address written to?
\slideitem{Semantics:
- established on target processor?
\slideitem{Precision:
- integer, floating point, operations...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Implementation Issues
\slideitem{Data typing issues:<Br>
- strong typing prevents misuse?
\slideitem{Exception handling:
- runtime recovery supported?
\slideitem{Memory monitoring:
-guard against memory depletion?
\slideitem{Separate compilation:
- type checking across modules etc?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Implementation Issues: Language Wars
<img  SRC = "../images/language_wars.gif">
<I>Acknowledgement: W.J. Cullyer, S.J. Goodenough, B.A. Wichmann, The choice of a Computer Language for Use in Safety-Critical Systems, Software Engineering Journal, (6)2:51-58, 1991.</I>

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Implementation Issues: Language Wars

\slideitem{CORAL subset:<Br>
staff training issues?
\slideitem{SPADE Pascal:
Praxis version of ISO Pascal.
\slideitem{Modula 2 subset:
SACEM trains in Paris.
\slideitem{Ada subset:
attempts at formal verification.
\slideitem{Meta question:
programmer more important than language?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Implementation Issues: Language Wars

\slideitem{Lots of new proposals:
- some more pragmatic than others...
\slideitem{Continuing attempts at standards:
\slideitem{An ENGINEERING approach:
\slideitem{Meta question 2:
language depends on application!

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Implementation Issues: Language Wars
Safety-Critical Systems Computer Language Survey
- Results

Newsgroups: comp.lang.ada,comp.lang.c++,comp.lang.misc,comp.software-eng
From: cpp@netcom.com (Robin Rowe)
Subject: Safety-Critical Survey (Results)
Message-ID: 
Organization: NETCOM On-line Communication Services (408 261-4700 guest)
Date: Sun, 13 Nov 1994 22:34:10 GMT
Lines: 180

========================================================

Here are the results of my recent informal survey of computer 
languages used in safety-critical embedded systems and other 
interesting systems. In responses, Ada was by far the most popular 
language for these systems followed by assembler. There is a list
describing 722 Ada projects that is available via ftp from the Ada
Information Clearinghouse. The current version is 213K in size 
(contact adainfo@ajpo.sei.cmu.edu). I did not attempt to integrate 
that data into this report.

No assertion is intended here that any language is necessarily 
superior to any other.

Aerospace:
---------

 Allied Signal: ?
 Boeing: Mostly Ada with assembler. Also: Fortran, Jovial, C, C++.
         Onboard fire extinguishers in PLM.
         777 seatback entertainment system in C++ with MFC (in 
         development by Microsoft).
         757/767: approximately 144 languages used.  
         747-400: approximately 75 languages used.
         777: approximately 35 languages used.
 Boeing Defense & Space Group: (777 cabin mgmt. system in Ada?)
 DAINA/Air Force: Aircraft mission manager in Ada.
 Chandler Evans: Engine Control System in Ada (386 DOS).
 Draper Labs/Army/NASA: Fault tolerant architecture in Ada/VHDL.
 DuPont: ?
 European Space Agency: mandates Ada for mission critical systems. 
        ISO (Infrared Space Observatory)
        SOHO (Solar and Heliospheric Observatory)
        Huygens/Cassini (a joint ESA/NASA mission to Saturn)
        Companies involved:
             British Aerospace (Space Systems) - Bristol, UK
             Fokker Space Systems - Amsterdam, Holland
             Matra-Marconi Espace - Toulouse, France
             Saab - Sweden
             Logica - UK
             DASA - Germany
             MBB - Germany
 Ford Aerospace: Spacecraft in Ada with assembler.
                 GEOS and INSAT spacecraft in FORTRAN.
                 (Ford Aerospace is now Space Systems/Loral.)
 Hamilton-Standard: (777 air cowling icing protection system in Ada?).
 Honeywell: Aircraft navigation data loader in C.
            (777 airplane information mgmt. system in Ada?)
 Intermetrics/Houston: space shuttle cockpit real-time executive 
                       in Ada '83 with 80386 assembly
 Lockheed Fort Worth: F-22 Advanced Tactical Fighter program in Ada 83
      (planning to move to Ada 94) with a very small amount in
      MIL-STD-1750A assembly. 
      Maintain older safety-critical systems for the F-111 and 
      F-16/F-16 variant airframes primarily done in JOVIAL.  
 NASA: Space station in Ada. (Sources differed on whether it was
       Ada only, or Ada with some C and assembler.)
 NASA Lewis: March 1994 space shuttle experiment in C++ on 386.
 Rockwell Space Systems Div.: Space shuttle in Hal/s and Ada.
                              Defense Initiative in Ada.
                              Other systems in Ada and C.
 Space Systems/Loral: Spacecraft in Ada with assembler.
 Teledyne: Aircraft flight data recorder in C.
 TRW/Air Force: Realtime avionics OS in Ada.
 Wilcox Electric: Navigation aids in C prior to 1990, Ada after.
                  VOR-DME in Ada.
                  Microwave landing system in Ada.
                  Wide Area GPS in C and C++.

Air Traffic Control:
-------------------

 Hughes: Canadian ATC system in Ada.
 Loral FSD: U.S. ATC system in Ada.
 Thomson-CSF SDC: French ATC system in Ada.

Land Vehicles:
-------------

 Bosch: Diesel engine controls in C. (Other systems generally in C?)
 Delco: Engine controls and ABS in 68C series (Motorola) assembler.
        C++ used for data acquisition in GM research center.
        '93+ GM trucks vehicle controllers mostly in Modula-GM
        (Modula-GM is a variant of Modula-2. A typical 32-bit 
         integrated vehicle controller may control the engine, the
         transmission, the ABS system, the Heating/AC system, as 
         well as the associated integrated diagnostics and off-board 
         communications systems.)
   Ford: Assembler.
 General Dynamic Land Systems: M1A2 tank tank software in Ada with
        time-critical routines in 68xxx assembler.
        Tank software simulators in C.
 Honda: ?
 Lucas: Many systems in Lucol (Lucas control language).
        Diesel engine controls in C++.
        ABS in 68xxx assembler.
 SAE: ? (Despite considerable effort on my part, I was unable to 
        gather any information on languages or language standards 
        from the Society of Automotive Engineers.)

Ships:
-----

 Vosper Thornycroft Ltd (UK): navigation control in Ada.
 
Trains:
------
 
 AMTRAK: ? 
 BART: ? (One rumor said Ada migrating to C. Can anyone confirm?)
 CSEE Transports (France): TGV Braking system in Ada (68K).
 Denver Airport baggage system: This well publicized problem 
    system is written in C++. (A source familiar with the system
    said the problems were political and managerial, not directly
    related to C++.)
 European Rail: Switching system in Ada.
 EuroTunnel: in Ada.
 Extension to the London Underground: in Ada.
 GEC Alsthom (France): Railway and signal control systems for trains 
      and the TGV (north lines and Chunnel) in Ada.
      Subway network control systems (Paris, Calcutta, and Cairo). 
 TGV France: Switching system in Ada.
 Union Switch & Signal, Pittsburgh: (Switching system in ?)
 Westinghouse Signals Ltd (UK):  Railway signalling systems in Ada.   
 Westinghouse Brake & Signal UK: Automatic Train Protection (ATP) systems
                                 for Westrace project in PASCAL.
 Westinghouse Australia: ATP systems in PASCAL and ADA.

Medical:
-------

 Baxter: Left Ventricular Heart Assist in C with 6811 assembler.
 Coulter Corp.: ONYX hematology analyzer in Ada.
 
Nuclear Reactors:
----------------

 Core and shutdown systems in assembler, migrating to Ada.

SURVEY METHODOLOGY
==================

I operated under the theory that, with regard to what languages 
are really in use, the recollections of the engineers themselves 
are probably the most accurate and open source. In general, I did 
not have enough sources that I could cross check the information. 
In cases where I could, the most interesting discrepancy was that 
companies that thought they had adopted one language as the total 
solution for all their software designs often had something in 
assembler or some other language somewhere. 

Every response to the survey was positive except one. An individual 
at Rockwell Collins said: "The language(s) we do/don't use is a 
matter best left to us, our customers, and the appropriate 
regulatory agencies governing our businesses and markets. All of 
these parties also look out for the public's interests in safety, 
cost, etc. as well." This individual took me to task for not 
contacting the PR department of his company, but was unwilling to 
help me do so. Per his request, I have omitted his company.

If you wish to add information or make a correction please send 
mail to cpp@netcom.com. I'd like to fill in the companies that 
have question marks by them. I'm particularly interested in 
systems written in C++. Names of respondents are held confidential. 
If you respond with a public follow-up on the net, please cc via 
e-mail to me so that I don't miss you. 

Thanks to everyone who helped with this. I meant to post this in
August, but got busy with work and relocating to Monterey and
forgot. Sorry for the delay.

Robin

embedded.svy rev 11-13-94
-- 
-----
Robin Rowe                 cpp@netcom.com  408-375-9449  Monterey, CA 
Rowe Technology            C++ training, consulting, and users groups.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Development: DO-178B 

\slideitem{Software Considerations in Airborne Systems and Equipment Certification.
\slideitem{Widely used, cf. IEC-61508.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Development: DO-178B 
<img width=450 src="../images/airbus.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Development: DO-178B Life Cycle

\slideitem{ Planning Process:
- coordinates development activities.
\slideitem{Software Development Processes:
- requirements process
- design process
- coding process
- integration process
\slideitem{ Software Integral Processes: 
- verification process
- configuration management 
- quality assurance 
- certification liaison

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Development: DO-178B Requirements for Design Descriptions

(a) A detailed description of how the software satisfies the specified software high-level requirements, including algorithms, data-structures and how software requirements are allocated to processors and tasks.

(b) The description of the software architecture defining the software structure to implement the requirements.

(c) ???????????

(d) The data flow and control flow of the design.

(e) Resource limitations, the strategy for managing each resource and its limitations, the margins and the method for measuring those margins, for example timing and memory.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}


Software Development: DO-178B Requirements for Design Descriptions
(f) Scheduling procedures and interprocessor/intertask communication mechanisms, including time-rigid sequencing, pre-emptive scheduling, Ada rendez-vous and interrupts.

(g) Design methods and details for their implementation, for example, software data loading, user modifiable software, or multiple-version dissimilar software.

(h) Partitioning methods and means of preventing partitioning breaches.

(i) Descriptions of the software components, whether they are new or previously developed, with reference to the baseline from which they were taken.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Development: DO-178B Requirements for Design Descriptions
(j) Derived requirements from the software design process.

(k) If the system contains deactivated code, a description of the means to ensure that the code cannot be enabled in the target computer.

(l) Rationale for those design decisions that are traceable to safety-related system requirements.

\slideitem{ Deactivated code (k) (see Ariane 5).
\slideitem{ Traceability issues interesting (l).

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Development: DO-178B Key Issues

\slideitem{ Traceability and lifecycle focus.
\slideitem{ Designated engineering reps.
\slideitem{ Recommended practices.
\slideitem{ Design verification:
- formal methods "alternative" only;
- "inadequate maturity";
- limited applicability in aviation.
\slideitem{Design validation:
- use of independent assessors etc.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

DO-178B - NASA GCS Case Study 

<img width=450 src="../images/Kelly.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

DO-178B - NASA GCS Case Study 

<img  SRC = "../images/kelly5.gif">
 NASA Langley Research Centre.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

DO-178B - NASA GCS Case Study 
<img  SRC = "../images/kelly6.gif">
 NASA Langley Research Centre.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
DO-178B - NASA GCS Case Study
\slideitem{Project compared:
- faults found in statistical tests;
- faults found in 178B development.
\slideitem{Main conclusions:
- such comparisons very difficult;
- DO-178B hard to implement;
- lack of materials/examples.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Software Development: DO-178B Practitioners' View

       The difficulties that have been identified are the DO-178 requirements for evidence and
       rigorous verification... 
       Systematic records of accomplishing each of the
       objectives and guidance are necessary. A documentation trail must exist
       demonstrating that the development processes not only were carried out, but also
       were corrected and updated as necessary during the program life cycle. Each
       document, review, analysis, and test must have evidence of critique for accuracy and
       completeness, with criteria that establishes consistency and expected results. This is
       usually accomplished by a checklist which is archived as part of the program
       certification records. The degree of this evidence varies only by the safety criticality of
       the system and its software. 

Original source on http://stsc.hill.af.mil/crosstalk/1998/oct/schad.asp

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Development: DO-178B Practitioners' View

...Engineering has not been schooled or trained to
                meticulously keep proof of the processes, product, and
                verification real-time. The engineers have focused on the
                development of the product, not the delivery. In addition,
                program durations can be from 10 to 15 years resulting in
                the software engineers moving on by the time of system
                delivery. This means that most management and engineers
have never been on a project from "cradle-to-grave." 

Original source on http://stsc.hill.af.mil/crosstalk/1998/oct/schad.asp

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Hardware Design: Fault Tolerant Architectures

\slideitem{The basics of hardware management.
\slideitem{Fault models.
\slideitem{Hardware redundancy.
\slideitem{Space Shuttle GPC Case Study.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Parts Management Plan

\slideitem{MIL-HDBK-965 
- help on hardware acquisition. 
\slideitem{ General dependability requirements.
\slideitem{ Not just about safety.
\slideitem{But often not considered enough...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

The Basics: Hardware Management 

\slideitem{MIL-HDBK-965 
Acquisition Practices for Parts Management
\slideitem{ Preferred Parts List
\slideitem{ Vendor and Device Selection
\slideitem{ Critical Devices, Technologies & Vendors
\slideitem{ Device Specifications
\slideitem{ Screening
\slideitem{ Part Obsolescence
\slideitem{ Failure Reporting, Analysis and Corrective Action (FRACAS)

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

The Basics: Hardware Management 
Some consequences of designing equipment without a PPL are:
\slideitem{ Proliferation of non-preferred parts and materials with identical
functions
\slideitem{ Increased need for development and preparation of engineering
justification for
new parts and materials
\slideitem{ Increased need for monitoring suppliers and inspecting/screening
parts and materials
\slideitem{ Selection of obsolete (or potentially obsolete) and sole-sourced
parts and materials
\slideitem{ Possibility of diminishing sources
\slideitem{ Use of unproven or exotic technology ("beyond" state-of-the-art)
\slideitem{ Incompatibility with the manufacturing process
\slideitem{ Inventory volume expansion and cost increases
\slideitem{ Increasing supplier base and audit requirements
\slideitem{ Loss of "ship-to-stock" or "just-in-time" purchase opportunities
\slideitem{ Limited ability to benefit from volume buys
\slideitem{ Increased cost and schedule delays
\slideitem{ Nonavailability of reliability data
\slideitem{ Additional tooling and assembly methods may be required to account
for the added
variation in part characteristics
\slideitem{ Decreased part reliability due to the uncertainty and lack of
experience with new parts
\slideitem{ Impeded automation efforts due to the added variability of part
types
\slideitem{ Difficulty in monitoring vendor quality due to the added number of
suppliers
\slideitem{ More difficult and expensive logistics support due to the increased
number of part
types that must be spared.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

The Basics: Hardware Management 
Must consider during hardware acquisition:
\slideitem{ Operating Temperature Range - parts should be selected which are rated
for the
operating temperature range to which they will be subjected.
\slideitem{ Electrical Characteristics - parts should be selected to meet EMI,
frequency,
waveform and signal requirements and maximum applied electrical stresses
(singularly
and in combination).
\slideitem{ Stability - parts should be selected to meet parameter stability
requirements based on
changes in temperature, humidity, frequency, age, etc.
\slideitem{ Tolerances - parts should be selected that will meet tolerance
requirements, including
tolerance drift, over the intended life.
\slideitem{ Reliability - parts should be selected with adequate inherent
reliability and properly
derated to achieve the required equipment reliability. Dominant failure
modes should
be understood when a part is used in a specific application.
\slideitem{ Manufacturability - parts should be selected that are compatible with
assembly
manufacturing process conditions.
\slideitem{ Life - parts should be selected that have "useful life" characteristics
(both operating and
storage) equal to or greater than that intended for the life of the
equipment in which they
are used.
\slideitem{ Maintainability - parts should be selected that consider mounting
provisions, ease of
removal and replacement, and the tools and skill levels required for
their removal/
replacement/repair.
\slideitem{ Environment - parts should be selected that can operate successfully in
the
environment in which they will be used (i.e., temperature, humidity,
sand and dust, salt
atmosphere, vibration, shock, acceleration, altitude, fungus, radiation,
contamination,
corrosive materials, magnetic fields, etc.).
\slideitem{ Cost - parts should be selected which are cost effective, yet meet the
required
performance, reliability, and environmental constraints, and life cycle
requirements.
\slideitem{ Availability - parts should be selected which are readily available,
from more than one
source, to meet fabrication schedules, and to ensure their future
availability to support
repairs in the event of failure.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Types of Faults

\slideitem{Design faults:
- erroneous requirements;
- erroneous software;
- erroneous hardware.
\slideitem{These are systemic failures;
- not due to chance but design.
\slideitem{Dont forget management/regulators!

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Types of Faults

\slideitem{Intermittent faults:
- fault occurs and recurrs over time;
- fault connections can recur.
\slideitem{Transient faults:
- fault occurs but may not recurr;
- electromagnetic interference.
\slideitem{Permanent faults:
- fault persists;
- physical damage to processor.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Fault Models

\slideitem{Single stuck-at models.
\slideitem{Hardware seen as `black-box'.
\slideitem{Fault modelled as:
- input or output error;
- stuck at either 1 or 0.
\slideitem{Models permanent faults.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Fault Models - Single Stuck-At...
<img src="../images/hardware1.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Fault Models

<img src="../images/hardware3.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Fault Models

\slideitem{Bridging Model:
- input not `stuck-at' 1 or 0;
- but shorting of inputs to circuit;
- input then is wired-or/wired-and.
\slideitem{Stuck-open model:
- both CMOS output transistors off;
- results is neither high nor low...
\slideitem{Transition and function models.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Software Faults (Aside...)

\slideitem{Much more could be said...
- see Leveson or Storey.
\slideitem{Huge variability:
- specification errors;
- coding errors;
- translation errors;
- run-time errors...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Redundancy 

\slideitem{ Adds:
- cost;
- weight;
- power consumption;
- complexity (most significant).
\slideitem{These can outweigh safety benefits.
\slideitem{Other techniques available:
- improved maintenance;
- better quality materials;
\slideitem{Sometimes no choice (Satellites).

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Hardware Redundancy Techniques

<img  SRC = "../images/redundancy.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Active Redundancy 

\slideitem{ When component fails...
\slideitem{ Redundant components do not have:
- to detect component failure;
- to switch to redundant resource.
\slideitem{ Redundant units always operate.
\slideitem{ Automatically pick up load on failure.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Standby Redundancy 

\slideitem{ Must detect failure.
\slideitem{Must decide to replace component.
\slideitem{Standby units can be operating.
\slideitem{Stand-by units may be brought-up.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Example Redundancy Techniques 

Bimodal Parallel/Series Redundancy</a>.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Triple Modular Redundancy (TMR)

\slideitem{ Possibly most widespread.

\slideitem{In simple voting arrangement,
- voting element -> common failure;
- so triplicate it as well.

\slideitem{ Multi-stage TMR architectures.

\slideitem{More cost, more complexity...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Multilevel Triple Modular Redundancy (TMR)

<img  SRC = "../images/multilevel_tmr.gif">

\slideitem{ No protection if 2 fail per level.
\slideitem{No protection from common failure
- eg if hard/software is duplicated.


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Fault Detection

\slideitem{Functionality checks:
- routines to check hardware works.
\slideitem{Signal Comparisons:
- compare signal in same units.
\slideitem{Information Redundancy:
- parity checking, M out of N codes...
\slideitem{Watchdog timers:
- reset if system times out.
\slideitem{Bus monitoring:
- check processor is `alive'.
\slideitem{Power monitoring:
- time to respond if power lost.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Space Shuttle General Purpose Computer (GPC) Case Study

"GPCs running together in the same GN&C (Guidance, Navigation and Control) OPS (Operational Sequence) are part of a redundant set performing identical tasks from the same inputs and
producing identical outputs. Therefore, any data bus assigned to a commanding GN&C GPC is heard by all members of the
redundant set (except the instrumentation buses because each GPC has only one dedicated bus connected to it). These
transmissions include all CRT inputs and mass memory transactions, as well as flight-critical data. Thus, if one or more GPCs in
the redundant set fail, the remaining computers can continue operating in GN&C. Each GPC performs about 325,000 operations
per second during critical phases. "

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Space Shuttle General Purpose Computer (GPC) Case Study

GPC status information among the primary avionics computers. If a GPC operating in a redundant set fails to meet two redundant
multiplexer interface adapter receiver during two successive reads of response data and does not receive any data while the other
members of the redundant set do not receive the data, they in turn will vote the GPC out of the set. A failed GPC is halted as
soon as possible."

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Space Shuttle General Purpose Computer (GPC) Case Study
"GPC failure votes are annunciated in a number of ways. The GPC status matrix on panel O1 is a 5-by-5 matrix of lights. For
example, if GPC 2 sends out a failure vote against GPC 3, the second white light in the third column is illuminated. The yellow
diagonal lights from upper left to lower right are self-failure votes. Whenever a GPC receives two or more failure votes from
other GPCs, it illuminates its own yellow light and resets any failure votes that it made against other GPCs (any white lights in its
row are extinguished). Any time a yellow matrix light is illuminated, the GPC red caution and warning light on panel F7 is
illuminated, in addition to master alarm illumination, and a GPC fault message is displayed on the CRT. "

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Space Shuttle General Purpose Computer (GPC) Case Study
"Each GPC power on , off switch is a guarded switch. Positioning a switch to on provides the computer with triply redundant
normally, even if two main or essential buses are lost. "

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Space Shuttle General Purpose Computer (GPC) Case Study

"(There are) 5 identical general-purpose computers aboard the orbiter control space shuttle vehicle systems. 
Each GPC is composed of two
separate units, a central processor unit and an input/output processor. All five GPCs are IBM AP
-101 computers. Each CPU and
IOP contains a memory area for storing software and data. These memory areas are collectively re
ferred to as the GPC's main
memory.

The IOP of each computer has 24 independent processors, each of which controls 24 data buses use
d to transmit serial digital
data between the GPCs and vehicle systems, and secondary channels between the telemetry system a
nd units that collect
instrumentation data. The 24 data buses are connected to each IOP by multiplexer interface adapt
ers that receive, convert and
validate the serial data in response to discrete signals calling for available data to be transmitted
or received from vehicle hardware."


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Space Shuttle General Purpose Computer (GPC) Case Study

"A GPC on orbit can also be ''freeze-dried;'' that is, it can be loaded with the software for a particular memory configuration and
then moded to standby. It can then be moded to halt and powered off. Since the GPCs have non-volatile memory, the software
is retained. Before an OPS transition to the loaded memory configuration, the freeze-dried GPC can be moded back to run and
the appropriate OPS requested. 

A failed GPC can be hardware-initiated, stand-alone-memory-dumped by switching the powered computer to terminate and halt
and then selecting the number of the failed GPC on the GPC memory dump rotary switch on panel M042F in the crew

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Space Shuttle General Purpose Computer (GPC) Case Study
"A simplex GPC is one in run and not a member of the redundant set, such as the BFS (Backup Flight System) GPC. Systems management and payload
major functions are always in a simplex GPC."

"Even though the four primary avionics software system GPCs control all GN&C functions during the critical phases of the
mission, there is always a possibility that a generic failure could cause loss of vehicle control. Thus, the fifth GPC is loaded with
different software created by a different company than the PASS developer. This different software is the backup flight system.
To take over control of the vehicle, the BFS monitors the PASS GPCs to keep track of the current state of the vehicle. If
required, the BFS can take over control of the vehicle upon the press of a button. The BFS also performs the systems
management functions during ascent and entry because the PASS GPCs are operating in GN&C. BFS software is always loaded
into GPC 5 before flight, but any of the five GPCs could be made the BFS GPC if necessary."

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Hardware Design: Fault Tolerant Architectures

\slideitem{The basics of hardware management.
\slideitem{Fault models.
\slideitem{Hardware redundancy.
\slideitem{Space Shuttle GPC Case Study.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Hardware Implementation Issues

\slideitem{COTS Microprocessors.
\slideitem{Specialist Microprocessors.
\slideitem{Programmable Logic Controllers
\slideitem{Electromagnetic Compatability

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

COTS Microprocessors
\slideitem{As we have seen:
- safety of software jeopardised
- if flaws in underlying hardware.
\slideitem{Catch-22 problem:
- best tools for COTS processors;
- most experience with COTS;
- least assurance with COTS...
\slideitem{Redundancy techniques help...
- but danger of common failures;
- vs cost of heterogeneity;

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

COTS Microprocessors

\slideitem{Where do the faults arise? 
1. fabrication failures;
2. microcode errors;
3. documentaiton errors.
\slideitem{Can guard against 1:
- using same processing mask;
- tests then apply to all of batch;
- high cost (specialist approach).
\slideitem{Cannot distinguish 2 from 3?
\slideitem{Undocumented instructions...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

COTS Microprocessors

"Steven O. Siegfried" <sos@dial324.skypoint.net> 
Mon, 10 Nov 1997 01:07:34 -0600 (CST)


New Intel Pentium risk: user mode program locks up system

The following program, when compiled and run in __USER__ mode on any Pentium
(reported as MMX or not, don't know about Pentium II yet) will lock-up the
system.

        > char x [5] = { 0xf0, 0x0f, 0xc7, 0xc8 };
        > main ()
        > {
        >   void (*f)() = x;
        >   f();
        > }

Any user can execute this program at the lowest level of security provided by
the following operating systems: OS/2, NT, W95, Linux.

When I tried it, I could _only_ recover by power-cycling my box.

The following perl script, courtesy of Sam Trenholme via the security
mailing list at Redhat Software is reported to find _all_ occurences of this
code sequence on systems running Linux.  (It found my bomb program after I
used it to kill my system as a test.)  It can probably be adapted for use on
other operating systems.

        > #!/usr/bin/perl
        > # Source: Sam Trenholme via linux-security@redhat.com mailing list.
        > # There is no known software fix to the F0 0F C7 C8 bug at this time.
        > # usage: $0 dir
        > # Where dir is the directory you recursively look at all programs in
        > # for instances of the F0 0F C7 C8 sequence.
        > # This script will search for programs with this sequence, which will
        > # help sysadmins take appropriate action against those running such
        > # programs.
        > # This script is written (but has not been tested) in Perl4, to
        > # insure maximum compatibility .
        > sub findit {
        >   local($dir,$file,@files,$data) = @_;
        >   undef $/;
        >   if(!opendir(DIR,$dir)) {
        >     print STDERR "Can not open $dir: $!\n";
        >     return 0;
        >     }
        >   @files=readdir(DIR);
        >   foreach $file (@files) {
        >     if($file ne '.' && $file ne '..') {
        >       if( -f "$dir/$file" && open(FILE,"< $dir/$file")) {
        >         $data=<FILE>;
        >         if($data =~ /\xf0\x0f\xc7\xc8/) {
        >           print "$dir/$file contains F0 0F C7 C8\n";
        >           }
        >         } elsif( -d "$dir/$file") {
        >           &findit("$dir/$file");
        >         }
        >       }
        >     }
        >   }
        > $dir = shift || '/home';
        >
        > &findit($dir);

Basically, there's no protection from this.  Adjust your execution of downline
loaded absolutes accordingly.

Steve Siegfried  sos@skypoint.com sos1@xtl.msc.edu

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

COTS Microprocessors

"Modern microprocessor chips are getting very complex indeed. The current gate
count can exceed 2.5 million. One must therefore expect that new versions of
such chips will contain logical bugs. A common form of bug is in the
microcode, but since the distinction between a microcode fault and another
form of design bug is difficult to define, the distinction is not made here.
We are *not* concerned with fabrication faults."

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

COTS Microprocessors

"Attempts to report bugs openly have not been successful.   A consequence of the above is that it is very difficult of users undertaking a
critical application to protect themselves against a potential design bug. One
approach that has been tried with one project is to use identical chips from
the same mask so that rig and development testing will extrapolate to the
final system. In some cases, the suppliers have provided information under a
non-disclosure agreement, be this seems to be restricted to major projects.
In contrast, quite a few software vendors have an open bug reporting scheme
--- and almost all provide a version number to the user. Hence it appears
in this area, software is in `advance' of hardware."

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

COTS Microprocessors

"The key issues extracted are as follows:
 
 \slideitem{Early chips are unreliable:
 
There have been some dramatic errors in very early releases of chips.
 
 \slideitem{Rarely used instructions are unreliable:
 
One report sent to me reported that some instructions not generated by
the `C' compiler were completely wrong. Another report noted that
special instructions for 64-bit integers did not work, and when this was
reported, the supplier merely removed them from the documentation!

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

COTS Microprocessors

"The key issues extracted are as follows (continued):
 \slideitem{Undocumented instructions are unreliable:
 
Obviously, such instructions must be regarded with suspicion.
 
\slideitem{Exceptional case handling is unreliable:
 
A classic instance of this problem is an error which has been reported
to me several times of the jump instructions on the 6502. When such an
instruction straddled a page boundary, it did not work correctly. This
issue potentially gives the user most cause for concern, since it may
be very difficult to avoid the issue. For instance, with machine
generated code form a compiler, the above problem with the 6502 would
be impossible to avoid."

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Specialist Microprocessors 

\slideitem{ Commercial microprocessor flaws.
\slideitem{ What happens if illegal opcode? 
- or result may be undefined?
\slideitem{Motorola 6801 test instruction
- fetches infinite bytes from memory;
- good to test for faults on bus;
- but could be executed erroneously;
- see Storey or comp.risks for more.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Specialist Microprocessors - AAMP2

\slideitem{ Collins Avionics/Rockwell group.
\slideitem{ AAMP2 
- 30+ in every Boeing 747-400.
\slideitem{High criticality implies cost
- can you sell enough to cover input?
\slideitem{What is money spent on?
- extra time spent on design?
- bench testing (see later);
- formal verification...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Specialist Microprocessors - AAMP5

"The AAMP5 verification was a project conducted to explore how formal
techniques for specification and verification could be introduced into an
industrial process. Sponsored by the Systems Validation Branch of NASA
Langley and Collins Commercial Avionics, a division of Rockwell
International, it was conducted by Collins and the Computer Science Research
Lab at SRI International. The project consisted of specifying in the PVS
language developed by SRI a portion of a Rockwell proprietary
microprocessor, the AAMP5, at both the instruction set and register-transfer
levels and using the PVS theorem prover to show the microcode correctly
implemented the specified behavior for a representative subset of
instructions. The formal verification was performed in parallel with the
development of the AAMP5 and did not replace any production verification
activities."

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Specialist Microprocessors - AAMP5
"
This methodology was used to formally verify a core set of eleven AAMP5
instructions representative of several instruction classes.  The core set
did not include floating point instructions.  Although the number of
instructions verified is small, the methodology and the formal machinery
developed are adequate to cover most of the remaining AAMP5 microcode.  The
success of this project has lead to a sequel in which the same methodology
is being reused to verify another member of the AAMP family of processors"

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Specialist Microprocessors - AAMP5

"Another key result was the discovery of both actual and seeded errors. Two
actual microcode errors were discovered during development of the formal
specification, illustrating the value of simply creating a precise
specification. Both errors were specific to the AAMP5 and corrected prior to
first fabrication. Two additional errors seeded by Collins in the microcode
were systematically uncovered by SRI while doing correctness proofs. One of
these was an actual error that had been discovered by Collins during testing
of an early prototype but left in the microcode provided to SRI. The other
or simulation."

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Specialist Microprocessors - AAMP5 and Formal Verification

\slideitem{For more details see:
\slideitem{In principle, it can be done,
- but still very expensive;
- need techniques and tools;
- reduce costs and increase subsets.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Specialist Microprocessors - Verifiable Integrated Processor for Enhanced Reliability (VIPER)

\slideitem{This is an old story...
- but still very controversial.
\slideitem{Royal Signals & Radar Establishment.
\slideitem{LCF-LSM and Ella.
\slideitem{Big claims about confidence levels:
- did MOD claim "fully proven"?
- proof from spec to production?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Specialist Microprocessors - Verifiable Integrated Processor for Enhanced Reliability (VIPER)

\slideitem{Charter technologies market the chip.
\slideitem{Sue MOD over "ungrounded" claims.
\slideitem{Charter into liquidation as costs rise.
\slideitem{Key lesson:
- general ignorance about proof;
- argument not absolute guarantee.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Specialist Microprocessors - 1750A, 1750B

- introduced in 1979;
- revised in 1982;
- deactivated in 1996;
- well documented/understood.
- but dont forget safety of language;
- not just processor reliability.
- started but never completed?
- 1750A remains a de facto standard.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Specialist Microprocessors - ERC32
\slideitem{Reliable not safety-critical?
\slideitem{Space and radiation tolerant.
\slideitem{Ada development tools.
- integer unit (IU);
- floating-point unit (FPU);
- memory controller (MEC).
\slideitem{Single-chip (TSC695E) June 1999.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Microprocessors 
\slideitem{MIL-STD1750 - expensive.
\slideitem{Select processor for application.
- Storey cites widespread use;
- range of less critical areas.
- specifically for airbag applications;
- "general purpose in-system programmable microcontrollers".

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Programmable Logic Controllers (PLCs)

\slideitem{Self contained:
- power supply;
- interface circuitry;
- 1+ processors.
\slideitem{Different from GPCs (eg Shuttle):
- replace electromechanical relays;
- perform simple logic functions.
\slideitem{Designed for high MTBFs:
- kernels provide trusted functions;
- proprietary source for firmware.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Programmable Logic Controllers (PLCs)

\slideitem{Widely used, well tested
\slideitem{But hard/software proprietary.
\slideitem{Certification by trusted bodies.
\slideitem{However...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Programmable Logic Controllers (PLCs)

Problem with PLC Software

"Lin Zucconi" <lin_zucconi@lccmail.ocf.llnl.gov> 
3 Mar 1993 16:50:50 U


People using Modicon 984 Series programmable controllers with Graysoft
Programmable Logic Controller (PLC) software Version 3.21 are advised to
contact Graysoft (414) 357-7500 to receive the latest version (3.50) of the
software.  A bug in Version 3.21 can corrupt a controller's logic and cause
equipment to operate erratically.  PLCs are frequently used in safety-related
applications.  Users often assume that if their "logic" is correct then they
are ok and forget that the underlying logic is implemented with software which
may not be correct.

Lin Zucconi  zucconi@llnl.gov

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Programmable Logic Controllers (PLCs)

\slideitem{PC emulators to develop software:
- download to target PLC;
- volaile store is dangerous;
- wide use of EEPROMS.
\slideitem{Fail safe PLC's:
- two or more independent CPU's;
- voting forms of redundancy;
- if conflict close down in safe state
\slideitem{Several graphical design techniques:
ladder & function block diagrams...
\slideitem{See Storey for more detail.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Electromagnetic Compatability (EMC)
\slideitem{Work in presence of interference.
\slideitem{AND not create interference.
\slideitem{Interference from external noise
\slideitem{Interference from external source
2 radio signals on same frequency.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Electromagnetic Compatability (EMC)
\slideitem{Difficult to predict.
\slideitem{Intensity changes over time;
- eg with work patterns;
\slideitem{Sources may also be mobile;
- or you may be mobile!
\slideitem{Mobile telephones, car ignitions...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Electromagnetic Compatability (EMC)

\slideitem{Protection.
\slideitem{Screening:
- use conductive cage/enclosure.
\slideitem{Check design of PCBs if possible:
- (power) loops form antennas;
- check use of ground planes.
\slideitem{Check CMOS output capacitance;
- can buy chips (Philips 8051);
- help discriminate signal edges.
\slideitem{Seek help from a specialist...


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Hardware Implementation Issues

\slideitem{COTS Microprocessors.
\slideitem{Specialist Microprocessors.
\slideitem{Programmable Logic Controllers
\slideitem{Electromagnetic Compatability

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Introduction

\slideitem{ Validation and Verification.
\slideitem{ What are the differences?
\slideitem{When, why and who? 
\slideitem{UK MOD DEF STAN 00-66

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Definitions and Distinctions

\slideitem{ Verification:
- does it meet the requirements?
\slideitem{ Validation:
- are the requirements any good?
\slideitem{ Testing:
- process used to support V&V.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Definitions and Distinctions

B.5.3.6 Verification and Validation. This sub-process evaluates the
products of other
Software Modification sub-processes to determine their compliance and
consistency with both
contractual and local standards and higher level products and
requirements. Verification and
validation consists of software testing, traceability, coverage analysis
and confirmation that
required changes to software documentation are made. Testing subdivides
into unit testing,
integration testing, regression testing, system testing and acceptance
testing.

Acknowledgement:
Integrated Logistic Support: Part 3, Guidance for Software Support.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Definitions and Distinctions

\slideitem{Misuse of terms?

A. The certification/validation process should confirm that hazards identified by hazard analysis, (HA),
 failure mode effect analysis (FMEA), and other system analyses have been eliminated by design or
 devices, or special procedures. The certification/validation process should also confirm that residual
 hazards identified by operational analysis are addressed by warning, labeling safety instructions or other
 appropriate means.
Acknowledgement:
Nonmandatory guidelines for certification/validation of safety systems for presence sensing device initiation of mechanical power presses - 1910.217 App B
\slideitem{More like verification?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Validation

\slideitem{ During design
- external review before commission; 
- external review for certification.
\slideitem{ During implementation:
- additional constraints discovered;
- additional requirements emmerge. 
\slideitem{ During operation:
- were the assumptions valid?
- especially environmental factors.
\slideitem{Validate:
- PRA's; development processes etc.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Validation: Waterfall Model

<img width=270 src="../images/waterfall.gif">
\slideitem{Validation at start and end.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Validation: Spiral Model

<img width=420 src="../images/spiral.gif">
\slideitem{Validation more continuous.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Validation: IEC 61508 (Draft)
The following should be considered in an overall safety validation plan:
\slideitem{ Details of when the validation should take place.
\slideitem{ Details of who should carry out the validation.

\slideitem{ Identification of the relevant modes of the system operation, including:
\slideitem{ preparation for use, including setting up and adjustment
\slideitem{ start up
\slideitem{ teach
\slideitem{ automatic
\slideitem{ manual
\slideitem{ semi-automatic
\slideitem{ steady-state operation
\slideitem{ resetting
\slideitem{ shutdown
\slideitem{ maintenance
\slideitem{ reasonably foreseeable abnormal conditions

\slideitem{ Identification of the safety-related systems and external risk reduction facilities that need to be validated for each mode of the system before commissioning commences.

\slideitem{ The technical strategy for the validation, for example, whether analytical methods or statistical tests are to be used.

\slideitem{ The measures, techniques and procedures that shall be used to confirm that each safety function conforms with the overall safety requirements documents and the safety integrity requirements.

\slideitem{ The specific reference to the overall safety requirements documents.

\slideitem{ The required environment in which the validation activities are to take place.

\slideitem{ The pass/fail criteria.

\slideitem{ The policies and procedures for evaluating the results of the validation, particularly failures.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Validation: MOD DEF STAN 00-60

D.4.1.6 Validation. 
At the earliest opportunity support resource
requirements should be
confirmed and measurements should be made of times for completion of all
software
operation and support tasks. Where such measurements are dependent upon
the system state
or operating conditions, averages should be determined over a range of
conditions. If
measurements are based on non-representative hardware or operating
conditions, appropriate
allowances should be made and representative measurements carried out as
soon as possible.
The frequency of some software support tasks will be dependent upon the
frequency of
software releases and the failure rate exhibited by the software.
                Integrated Logistic Support: Part 3, Guidance for Software
                Support. 

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Validation: MOD DEF STAN 00-60

D.4.1.6 Validation. (Cont.)
Measurements of software failure rates and fault densities obtained
during software and
system testing might not be representative of those that will arise
during system operation.
However, such measurements may be used, with caution, in the validation
of models and
assumptions.
For repeatable software engineering activities, such as compilation and
regression testing, the
time and resource requirements that arose during development should be
recorded. Such
information may be used to validate estimates for equivalent elements of
the software
modification process.
For other software engineering activities, such as analysis, design and
coding, the time and
resource requirements that arose during development should be recorded.
However, such
information should only be used with some caution in the validation of
estimates for
equivalent elements of the software modification process.
The preceding clauses might imply the need for a range of metrics 
                Integrated Logistic Support: Part 3, Guidance for Software
                Support. 

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Validation: Summary of Key Issues

\slideitem{ Who validates validator?
- External agents must be approved.
\slideitem{ Who validates validation?
- Clarify links to certification.
\slideitem{What happens if validation fails?
- Must have feedback mechanisms;
- Links to process improvement?
\slideitem{NOT the same as verification!


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Verification: Leveson's Strategies

<img width=480 src="../images/verify1.gif">

\slideitem{Show that functional requirements
- are consistent with safety criteria ?
\slideitem{Implementation may include hazards
not in safety/functional requirements.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Verification: Leveson's Strategies

<img width=490 src="../images/verify2.gif">

\slideitem{Show that implementation is
- same as functional requirements?
\slideitem{Too costly and time consuming
all safety behaviour in specification?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Verification: Leveson's Strategies

<img width=490 src="../images/verify3.gif">

\slideitem{Or show that the implementation
- meets the safety criteria.
\slideitem{Fails if criteria are incomplete...
- but can find specification errors.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Verification: Lifecycle View

<img width=260 src="../images/waterfall.gif">
\slideitem{At several stages in waterfall model.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Verification: Lifecycle View

<img width=420 src="../images/spiral.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Verification

\slideitem{Verification as a catch-all?

"Verification is defined as determining whether or not the products of each phase of the software development process fulfills all the requirements from the previous phase."

\slideitem{So a recurrent cost, dont forget...
- verification post maintenance.
\slideitem{Verification supported by:
- determinism (repeat tests);
- separate safety-critical functions;
- well defined processes;
- simplicity and decoupling.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Verification

D.5.1 Task 501 Supportability Test, Evaluation and
                Verification 
D.5.1.1 Test and Evaluation Strategy. 
Strategies for the evaluation of
system supportability
should include coverage of software operation and software support.
Direct measurements
and observations may be used to verify that all operation and support
activities - that do not
involve design change - may be completed using the resources that have
been allocated.
During the design and implementation stage measurements may be conducted
on similar
systems, under representative conditions.
As software modification activity is broadly similar to software
development the same
monitoring mechanism might be used both pre- and post-implementation.
Such a mechanism
is likely to be based on a metrics programme that provides information,
inter alia, on the rate
at which software changes are requested and on software productivity.
                Integrated Logistic Support: Part 3, Guidance for Software
                Support. 

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Verification
D.5.1.3 Objectives and Criteria.
 System test and evaluation programme
objectives should
include verification that all operation and support activities may be
carried out successfully -within
skill and time constraints - using the PSE and other resources that have
been defined.
The objectives, and associated criteria, should provide a basis for
assuring that critical
software support issues have been resolved and that requirements have
been met within
acceptable confidence levels. Any specific test resources, procedures or
schedules necessary
to fulfil these objectives should be included in the overall test
programme. Programme
objectives may include the collection of data to verify assumptions,
models or estimates of
software engineering productivity and change traffic.
                Integrated Logistic Support: Part 3, Guidance for Software
                Support. 

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Verification

D.5.1.4 Updates and Corrective Actions. 
Evaluation results should be
analyzed and
corrective actions determined as required. Shortfalls might arise from:
\slideitem{ Inadequate resource provision for operation and support tasks.
\slideitem{ Durations of tasks exceeding allowances.
\slideitem{ Software engineering productivity not matching expectations.
\slideitem{ Frequencies of tasks exceeding allowances.
\slideitem{ Software change traffic exceeding allowances.
Corrective actions may include: increases in the resources available;
improvements in
training; additions to the PSE or changes to the software, the support
package or, ultimately,
the system design. Although re-design of the system or its software
might deliver long term
benefits it would almost certainly lead to increased costs and programme
slippage.
                Integrated Logistic Support: Part 3, Guidance for Software
                Support. 

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Verification: Summary of Key Issues

\slideitem{What can we affoard to verify?
\slideitem{ Every product of every process?
- MIL HDBK 338B...
\slideitem{ Or only a few key stages?
\slideitem{If the latter, do we verify :
- specification by safety criteria?
- implementation by safety criteria?
- or both...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Verification: Summary of Key Issues

\slideitem{Above all....
\slideitem{Verification is about proof.
\slideitem{Proof is simply an argument.
\slideitem{Argument must be correct but
- not a mathematical `holy grail'... 

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Introduction

\slideitem{ Validation and Verification.
\slideitem{ What are the differences?
\slideitem{When, why and who? 
\slideitem{UK MOD DEF STAN 00-66

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Testing

\slideitem{ The processes used during:
- validation and verification.
\slideitem{White and black boxes.
\slideitem{Static and Dynamic techniques
\slideitem{Mode confusion case study.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Definitions and Distinctions

\slideitem{ Black box tests:
- tester has no access to information
- about the system implementation.
\slideitem{Good for independence of tester.
\slideitem{But not good for formative tests.
\slideitem{Hard to test individual modules...


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Definitions and Distinctions

\slideitem{ White box tests:
- tester can access information about
- the system implementation.
\slideitem{Simplifies diagnosis of results.
\slideitem{Can compromise independence?
\slideitem{How much do they need to know?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Definitions and Distinctions

\slideitem{ Module testing:
- tests well-defined subset.
\slideitem{ Systems integration:
- tests collections of modules.
\slideitem{ Acceptance testing:
- system meets requirements?
\slideitem{Results must be documented.
\slideitem{Changes will be costly.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Dynamic Testing - Process Issues

\slideitem{ Functional testing:
- test cases examine functionality;
- see comments on verification.
\slideitem{ Structural testing:
- knowledge of design guides tests;
- interaction between modules...
- test every branch (coverage)?
\slideitem{ Random testing:
- choose from possible input space;
- or beyond the "possible"...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Definitions and Distinctions

\slideitem{ Dynamic testing:
- execution of system components;
- is environment being controlled?
\slideitem{ Static testing:
- investigation without operation;
- pencil and paper reviews etc.
\slideitem{Most approaches use both.
\slideitem{Guide the test selection by using:
- functional requirements:
- safety requirements;
- (see previous lecture). 

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Definitions and Distinctions

<img src="../images/test_lifecycle.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Dynamic Testing

\slideitem{ Where do you begin?
\slideitem{Look at the original hazard analysis;
- demonstrate hazard elimination?
- demonstrate hazard reduction?
- demonstrate hazard control?
\slideitem{Must focus both on:
- expected and rare conditions.
\slideitem{PRA can help - but for software?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Dynamic Testing - Leveson's Process Issues

\slideitem{ Review test plans.
\slideitem{ recommend tests based on the hazard analyses, safety standards and checklists, previous accident and incidents, operator task analyses etc.
\slideitem{ Specify the conditions under which the test will be conducted.
\slideitem{ Review the test results for any safety-related problems that were missed in the analysis or in any other testing.
\slideitem{ Ensure that the testing feedback is integrated into the safety reviews and analyses that will be used in design modifications.

\slideitem{ All of this will cost time and money.
\slideitem{ Must be planned, must be budgeted.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Dynamic Testing Techniques 

\slideitem{ Partitioning:
- identify groups of input values;
- do they map to similar outputs?
\slideitem{ Boundary analysis:
- extremes of valid/invalid input.
\slideitem{ Probabilistic Testing:
- examine reliability of system.
\slideitem{ (State) Transition tests:
- trace states, transitions and events.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Dynamic Testing Techniques 

\slideitem{ Simulation:
- assess impact on EUC (IEC61508).
\slideitem{ Error seeding:
- put error into implementation;
- see is test discover it (dangerous).
\slideitem{ Performance monitoring:
- check real-time, memory limits.
\slideitem{ Stress tests:
- abnormally high workloads?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Dynamic Testing - Software Issues

\slideitem{ Boundary conditions.
\slideitem{ Incorrent and unexpected inputs sequences.
\slideitem{ Altered timings - delays and over-loading.
\slideitem{ Environmental stress - faults and failures.
\slideitem{ Critical functions and variables.
\slideitem{ Firewalls, safety kernels and other special safety features.
\slideitem{ Usual suspects...automated tests?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Limitations of Dynamic Testing 

\slideitem{ Cannot test all software paths.
\slideitem{Cannot even text all hardware faults.
\slideitem{Not easy to test in final environment:
\slideitem{User interfaces very problematic:<Br>
- effects of fatigue/longitudinal use?
- see section on human factors.
\slideitem{Systems CHANGE the environment!
\slideitem{How can we test for rare events?
- may have to wait 10^{9} years?<Br>

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Static Testing 

\slideitem{Dont test the system itself.
\slideitem{ Test an abstraction of the system
\slideitem{Perform checks on requirements?
\slideitem{Perform checks on static code.
\slideitem{Scope depends on representation...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Static Testing Techniques

- peer review by other engineers.
\slideitem{ Fagan inspections:
- review of design documents.
\slideitem{ Symbolic execution:
- use term-rewriting on code;
- does code match specification?
\slideitem{ Metrics:
- lots (eg cyclomatic complexity);
- most very debatable... 


Static Testing Techniques

\slideitem{Sneak Circuit Analysis:
- find weak patterns in topologies;
- for hardware not software.
\slideitem{Software animation:
- trace behaviour of software model;
- Petri Net animation tools.
\slideitem{Performance/scheduling theory:
- even if CPU scheduling is static;
- model other resource allocations.
\slideitem{Formal methods:
- considerable argument even now;
- compare 00-60 with DO-178B...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Formal Methods: The Mode Confusion Case Study

\slideitem{Recent, novel use formal analysis.
\slideitem{To guide/direct other testing.
\slideitem{The mode confusion problem...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Formal Methods: Mode Confusion Case Study

The Flight Guidance System (FGS) 
compares the
measured state of an aircraft (position, speed,
and attitude) to the desired state and generates
pitch and roll guidance commands to minimize
the difference between the measured and desired
state. When engaged, the Autopilot (AP) translates
these commands into movement of the aircrafts
control surfaces necessary to achieve the
commanded changes about the lateral and vertical
axes. An FGS can be further broken down
into the mode logic and the flight control laws.
The mode logic accepts commands from the
flight crew, the Flight Management System
(FMS), and information about the current state
of the aircraft to determine which system modes
are active. The active modes in turn determine
which flight control laws are used to generate
the pitch and roll guidance commands. The active
lateral and vertical modes are displayed (an 
Instrumentation System (EFIS)). 

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Formal Methods: Mode Confusion Case Study

<img  SRC = "../images/mode.gif">
"../reports/butler-etal-dasc98.pdf">A Formal Methods Approach to the Analysis of
 Mode Confusion</a>.   In AIAA/IEEE Digital Avionics Systems Conference, October
, 1998.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Formal Methods: Mode Confusion Case Study

1. Opacity (i.e., poor display of automation state),
2. Complexity (i.e., unnecessarily complex automation), 
3. incorrect mental model (i.e., the flight crew misunderstands the behaviourr of the automation).
<P>
Traditional human factors has concentrated on (1), and made significant progress has been made.
However, mitigation of mode confusion will require addressing problem sources (2) and (3) as well. 
Towards this end, our approach uses two complementary strategies based upon a formal model:

<b>Visualisation</b>
Create a clear, executable model of the automation that is easily understood by flight crew and use it to drive a flight deck mockup from the formal model

Analysis
Conduct mathematical analysis of the model.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Formal Methods: Mode Confusion Case Study

\slideitem{Problems stemming from modes:
- input has different effect;
- uncommanded mode changes;
- different modes->behaviours;
- different intervention options;
- poor feedback.
\slideitem{ObjectTime visualisation model...
\slideitem{Represent finite state machines.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Formal Methods: Mode Confusion Case Study

<img  width=450 SRC = "../images/mode2.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Formal Methods: Mode Confusion Case Study

 The state of the
Flight Director (FD), Autopilot (AP), and each
of the lateral and vertical modes are modeled as
In Figure 3 (see previous slide), the FD is On with the guidance cues
displayed; the AP is Engaged; lateral Roll,
Heading, and Approach modes are Cleared; lat-eral
NAV mode is Armed; vertical modes Pitch,
Approach, and AltHold are Cleared; and the VS
mode is Active. Active modes are those that
actually control the aircraft when the AP is en-gaged.
These are indicated by the heavy dark
boxes around the Active, Track, and lateral
Armed modes.
"../reports/butler-etal-dasc98.pdf">A Formal Methods Approach to the Analysis of
 Mode Confusion</a>.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Formal Methods: Mode Confusion Case Study

\slideitem{ObjectTime model:
- give pilots better mental model?
- drive simulation (dynamic tests?).
\slideitem{Build more complete FGS model
- prove/test for mode problems.
\slideitem{ Discrete maths:
- theorem proving;
- or model checking?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Formal Methods: Mode Confusion Case Study

The first problem is formally defining
what constitutes an indirect mode change. Lets
begin by defining it as a mode change that occurs
when there has been no crew input:

Indirect_Mode_Change?(s,e): bool = 
NOT Crew_input?(e) AND Mode_Change?(s,e)

No_Indirect_Mode_Change: LEMMA
Valid_State?(s) IM\slideitem{S 
NOT Indirect_Mode_Change?(s,e)

"../reports/butler-etal-dasc98.pdf">A Formal Methods Approach to the Analysis of
 Mode Confusion</a>.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Formal Methods: Mode Confusion Case Study

We then seek to prove the false lemma above
using GRIND, a brute force proof strategy that
works well on lemmas that do not involve quantification.
The resulting unproved sequents
elaborate the conditions where indirect mode
changes occur. For example,
<PRE>
{-1} Overspeed_Event?(e!1)
{-2} OFF?(mode(FD(s!1)))
{-3} s!1 WITH [FD := FD(s!1) WITH [mode := CUES],
LATERAL := LATERAL(s!1) WITH
[ROLL := (# mode := ACTIVE #)],
VERTICAL := VERTICAL(s!1) WITH
[PITCH := (# mode := ACTIVE #)]]
= NS
{-4} Valid_State(s!1)
|-------
{1} mode(PITCH(VERTICAL(s!1))) =
mode(PITCH(VERTICAL(NS)))
</PRE>
The situations where indirect mode
changes occur are clear from the negatively labeled
formulas in each sequent. We see that an
indirect mode change occurs when the overspeed
event occurs and the Flight Director is off.
This event turns on the Flight Director and
places the system into modes ROLL and
PITCH.

"../reports/butler-etal-dasc98.pdf">A Formal Methods Approach to the Analysis of
 Mode Confusion</a>.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Formal Methods: Mode Confusion Case Study

We define an ignored command as one in
which there is a crew input and there is no mode
change. We seek to prove that this never happens:

No_Ignored_Crew_Inputs: LEMMA
Valid_State(s) AND Crew_Input?(e) IM\slideitem{S
NOT Mode_Change?(s,e)

The result of the failed proof attempt is a set of
sequents similar to the following:

{-1} VS_Pitch_Wheel_Changed?(e!1)
{-2} CUES?(mode(FD(s!1)))
{-3} TRACK?(mode(NAV(LATERAL(s!1))))
{-4} ACTIVE?(mode(VS(VERTICAL(s!1))))
|-------
{1} ACTIVE?(mode(ROLL(LATERAL(s!1))))
{2} ACTIVE?(mode(HDG(LATERAL(s!1))))

The negatively labeled formulas in the
sequent clearly elaborate the case where an input
is ignored, i.e., when the VS/Pitch Wheel is
changed and the Flight Director is displaying
CUES and the active lateral mode is ROLL and
the active vertical mode is PITCH. In this way,
PVS is used to perform a state exploration to
discover all conditions where the lemma is false,
i.e., all situations in which a crew input is ignored.

"../reports/butler-etal-dasc98.pdf">A Formal Methods Approach to the Analysis of
 Mode Confusion</a>.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Formal Methods: Mode Confusion Case Study

\slideitem{Are these significant for user?

\slideitem{Beware:
- atypical example of formal methods;
- havent mentioned refinement;
- havent mentioned implementation;
- much more could be said...
- see courses on formal methods.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Testing

\slideitem{ The processes used during:
- validation and verification.
\slideitem{White and black boxes.
\slideitem{Static and Dynamic techniques
\slideitem{Mode confusion case study.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Individual Human Error 
\slideitem{Slips, Lapses and Mistakes.
\slideitem{Rasmussen: Skill, Rules, Knowledge.
\slideitem{Reason: Generic Error Modelling.
\slideitem{Risk Homeostasis.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

What is Error?
\slideitem{Deviation from optimal performance?
- very few achieve the optimal.
\slideitem{Failure to achive desired outcome?
- desired outcome can be unsafe.
\slideitem{Departure from intended plan?
- but environment may change plan...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

What is Error?

<img  SRC = "../images/error.gif">
Acknowledgement:J. Reason, <i>Human Error</i>, Cambridge University Press, 1990 (ISBN-0-521-31419-4).

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Types of Errors...
\slideitem{Slips:
- correct plan but incorrect action;
- more readily observed.
\slideitem{Lapses:
- correct plan but incorrect action;
- failure of memory so more covert?
\slideitem{Mistakes:
- incorrect plan;
- more complex, less understood.
\slideitem{Human error modelling helps to:
- analyse/distinguish error types.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Rasmussen: Skill, Rules and Knowledge

\slideitem{Skill based behaviour:
- sensory-motor performance;
- without conscious control;
- automated, high-integrated.
\slideitem{Rule based behaviour:
- based on stored procedures;
- induced by experience or taught;
- problem solving/planning.
\slideitem{Knowledge based behaviour:
- in unfamilliar situations;
- explicitly think up a goal;
- develop a plan by selection;
- try it and see if it works.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Rasmussen: Skill, Rules and Knowledge

<img  SRC = "../images/rasmussen.gif">

Acknowledgement: J. Rasmussen, <I>Skill, Rules, Knowledge: Signals, Signs and Symbols and Other Distinctions in Human Performance Models.   IEEE Transactions on Systems, Man and Cybernetics (SMC-13)3:257-266, 1983.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Rasmussen: Skill, Rules and Knowledge

\slideitem{Signals:
- sensory data from environment;
- continuous variables;
- cf Gibson's direct perception.

\slideitem{Signs:
- indicate state of the environment;
- with conventions for action;
- activate stored pattern or action.

\slideitem{Symbols:
- can be formally processed;
- related by convention to state.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Rasmussen: Skill, Rules and Knowledge

\slideitem{ Skill-based errors:
- variability of human performance.

\slideitem{ Rule-based errors:
- misclassification of situations;
- application of wrong rule;
- incorrect recall of correct rule.

\slideitem{ Knowledge-based errors:
- incomplete/incorrect knowledge;
- workload and external constraints...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Building on Rasmussen's Work

\slideitem{How do we account for:
- slips and lapses in SKR?
\slideitem{Can we distinguish:
- more detailed error forms?
- more diverse error forms?
\slideitem{Before an error is detected:
- operation is, typically, skill based.
\slideitem{After an error is detected:
- operation is rule/knowledge based.
\slideitem{GEMS builds on these ideas...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

GEMS: Monitoring Failures

\slideitem{Normal monitoring:
- typical before error is spotted;
- preprogrammed behaviours plus;
- attentional checks on progress.
\slideitem{Attentional checks:
- are actions according to plan?
- will plan still achieve outcome?
\slideitem{Failure in these checks:
- often leads to a slip or lapse.
\slideitem{Reason also identifies:
- <I>Overattention</i> failures.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
GEMS: Problem Solving Failures
\slideitem{Humans are pattern matchers:
- prefer to use (even wrong) rules;
- before effort of knowledge level.
\slideitem{Local state information:
- indexes stored problem handling;
- schemata, frames, scripts etc.
\slideitem{Misapplication of good rules:
- incorrect situation assessment;
- over-generalisation of rules.
\slideitem{Application of bad rules:
- encoding deficiencies;
- action deficiencies.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

GEMS: Knowledge-Based Failures

\slideitem{Thematic vagabonding:
- superficial analysis/behaviour;
- flit from issue to issue.
\slideitem{Encysting:
- myopic attention to small details;
- meta-level issues may be ignored.
\slideitem{Reason:
- individual fails to recognise failure;
- does not face up to consequences.
\slideitem{Berndt Brehmer & Dietrich Doerner.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}


GEMS: Failure Modes and the SKR Levels

<img  SRC = "../images/gems.gif">
Acknowledgement:J. Reason, <i>Human Error</i>, Cambridge University Press, 1990 (ISBN-0-521-31419-4).

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

GEMS: Error Detection
\slideitem{Dont try to eliminate errors:
- but focus on their detection.
\slideitem{Self-monitoring:
- correction of postural deviations;
- correction of motor responses;
- detection of speech errors;
- detection of action slips;
- detection of problem solving error.
\slideitem{How do we support these activities?
- standard checks procedures?
- error hypotheses or suspicion?
- use simulation based training?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

GEMS: Error Detection
\slideitem{Dont try to eliminate errors:
- but focus on their detection.
\slideitem{Environmental error cueing:
- block users progress;
- help people discover error;
- "gag" or prevent input;
- allow input but warn them;
- ignore erroneous input;
- self correct;
- force user to explain..
\slideitem{Importance of other operators

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

GEMS: Error Detection
\slideitem{Cognitive barriers to error detection.
\slideitem{Relevance bias:
- users cannot consider all evidence;
- "confirmation bias".
\slideitem{ Partial explanations:
- users accept differences between
- "theory about state" and evidence.
\slideitem{Overlaps:
- even incorrect views will receive
- some confirmation from evidence.
\slideitem{"Disguise by familliarity".

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

GEMS: Practical Application
\slideitem{So how do we use GEMS?
\slideitem{Try to design to avoid all error?
\slideitem{Use it to guide employee selection?
\slideitem{Or only use it post hoc:
- to explain incidents and accidents?
\slideitem{No silver bullet, no panacea.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

GEMS: Practical Application
\slideitem{Eliminate error affoardances:
- increase visibility of task;
- show users constraints on action.
\slideitem{Decision support systems:
- dont just present events;
- provide trend information;
- "what if" subjunctive displays; 
- prostheses/mental crutches?
\slideitem{Memory aids for maintenance:
- often overlooked;
- aviation task cards;
- must maintain maintenance data! 

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

GEMS: Practical Application
\slideitem{Improve training:
- procedures or heuristics?
- simulator training (contentious).
\slideitem{Error management:
- avoid high-risk strategies;
- high probability/cost of failure.
\slideitem{Ecological interface design:
- Rasmussen and Vincente;
- 10 guidelines (learning issues).
\slideitem{Self-awareness:
- when might I make an error?
- contentious...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

GEMS: Practical Application

<img src="../images/reason2.gif">

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

GEMS: Outstanding Issues

\slideitem{Problem of intention:
- is an error a slip or lapse?
- is an error a mistake of intention?
\slideitem{Given an observations of error:
- aftermath of accident/incident;
- guilt, insecurity, fear, anger.
\slideitem{Can we expect valid answers?
\slideitem{Can we make valid inferences?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

GEMS: Outstanding Issues

\slideitem{ GEMS focusses on causation:
- built on Rasmussens SKR model;
- therefore, has explanatory power
\slideitem{Hollnagel criticises it:
- difficult to apply in the field;
- do observations map to causes?
\slideitem{Glasgow work has analysed:
- GEMS plus active/latent failures;
\slideitem{Results equivocal, GEMS:
- provides excellent vocabulary;
- can be hard to perform mapping.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

GEMS: Outstanding Issues (Risk Homeostasis Theory)

\slideitem{What happens if we introduce the
- decision aids Reason suggests?
"Each road user has a target (or accepted) level of risk which acts as a comparison with actual risk.
Where a difference exists, one may move towards the other.
Thus, when a safety improvement occurs, the target level of risk motivates behaviour to compensate - e.g., drive faster or with less attention.
Risk homeostasis theory (RHT) has not beenconcerned with the cognitive or behavioural pathways by which homeostasis occurs, only with the consequences of adjustments in terms of accident loss."
Acknowledgement: T.W. Hoyes and A.I. Glendon, <I>Risk Homeostasis: Issues for Further research</I>, Safety Science, 16:19-33, (1993).

\slideitem{Will users accept more safety?
- or trade safety for performance?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

GEMS: Outstanding Issues (Risk Homeostasis Theory)

\slideitem{Very contentions.
\slideitem{Bi-directionality?
- what if safety levels fall?
- will users be more cautious?
\slideitem{Does it affect all tasks?
\slideitem{Does it affect work/leisure?
\slideitem{How do we prove/disprove it?
- unlikely to find it in simulators.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Conclusions: Individual Human Error 
\slideitem{Slips, Lapses and Mistakes.
\slideitem{Rasmussen: Skill, Rules, Knowledge.
\slideitem{Reason: Generic Error Modelling.
\slideitem{Risk Homeostasis.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work
\slideitem{Workload.
\slideitem{Situation Awareness.
\slideitem{Crew Resource Management

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Workload
\slideitem{High workload:
- stretches users resources.
\slideitem{Low workload:
- wastes users resources;
- can inhibit ability to respond.
\slideitem{Cannot be "seen" directly;
- is inferred from behaviour.
\slideitem{No widely accepted definition?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Workload
"Physical workload is a straightforward concept.   It is easy to measure and define in terms of energy expenditure.    Traditional human factors texts tell us how to measure human physical work in terms of kilocalories and oxygen consumption..."
Acknowledgement: B.H. Kantowitz and P.A. Casper, Human Workload in aviation.   In E.L. Wiener and D.C. Nagel (eds.), Human Factors in Aviation, 157-187, Academic Press, London, 1988.
"The experience of workload is based on the amount of effort, both physical and psycholoigcal, expended in response to system demands (taskload) and also in accordance with the operator's internal standard of performance."
Acknowledgement: E.S. Stein and B. Rosenberg, The Measurement of Pilot Workload, Federal Aviation Authority, Report DOT/FAA/CT82-23, NTIS No. ADA124582, Atlantic City, 1983.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Workload
- Wickens on perceptual channels;
- Kantowitz on problem solving;
- Hart on overall experience.
\slideitem{Holistic vs atomistic approaches:
- FAA (+ Seven) a gestalt concept;
- cannot measure in isolation;
- (many) experimentalists disagree.
\slideitem{Single-user vs team approaches:
- workload is dynamic;
- shared/distributed between a team;
- many prvious studies ignore this.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Workload
\slideitem{ How do we measure workload?
\slideitem{ Subjective ratings?
- NASA TLX, task load index;
- consider individual differences.
\slideitem{ Secondary tasks?
- performance on additional task;
- obtrusive & difficult to generalise.
\slideitem{ Physiological measures?
- heart rate, skin temperature etc;
- lots of data but hard to interpret.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Workload
\slideitem{ How to reduce workload?
\slideitem{ Function allocation?
- static or dynamic allocation;
- to crew, systems or others (ATC?).
\slideitem{ Automation?
- but it can increase workload;<BR>
- or change nature (monitoring).
\slideitem{ Crew resource management?
- coordination, decision making etc;
- see later in this section...

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Workload and Situation Awareness
<img src="../images/working_memory.gif">
Acknowledgement: C.D. Wickens and J.M. Flach, Information Processing.   In E.L. Wiener and D.C. Nagel (eds.), Human Factors in Aviation, 111-156, Academic Press, London, 1988.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Situation Awareness
"Situation awareness is the perception of the elements of the environment within a volume of time and spcae, the comprehension of their meaning, and the projection of their status in the near future"

Acknowledgement: M. R. Endsley, Design and Evaluation for Situation Awareness Enhancement.   In Proceedings of the Human Factors Society 32nd Annual Meeting, 97-101.   Human Factors Society, Santa Monica, CA, 1988.

\slideitem{Rather abstract definition.
\slideitem{Most obvious when it is lost.
\slideitem{Difficult to explain behaviour:
- beware SA becoming a "catch all";
- just as "high workload" was.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Situation Awareness

<img  width=400 SRC = "../images/situation_awareness.gif">
<I>Hint: use your browser to open this image</I>
Acknowledgement: M.R. Endsley, <I>Towards a Theory of Situation Awareness</I>, Human Factors, (37)1:32-64, 1995.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Situation Awareness

\slideitem{Level 1: perception of environment
- how much can be attended to?
- clearly not everything...
\slideitem{Level 2: Comprehension of situation
- synthesise the elements at level 1;
- significance determined by goals.
\slideitem{Level 3: Projection of future.
- knowledge of status and dynamics;
- may only be possible in short term;
- enables strategy not just reaction.
\slideitem{Novice perceives everything at L1;
- but fails at levels 2 and 3.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Situation Awareness

<img src="../images/situation_awareness2.gif">
Acknowledgement: D.G. Jones and M.R. Endsley, Sources of Situation Awareness Errors in Aviation. Aviation, Space and Evironmental Medecine, 67(6):507-512, 1996. 
\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Situation Awareness
\slideitem{Hmm, subjective classification.
\slideitem{33 incidents with Air Traffic Control.
\slideitem{NASA (ASRS) reporting system:
- how typical are reported events?
\slideitem{I worry about group work:
- colleagues help you maintain SA?
- prompting, reminding, informing? 

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Situation Awareness
"Investigators were able to trace a series of errors that initiated with the flight crews acceptance of the controller's offer to land on runway 19.
The flightcrew expressed concern about possible delays and accepted an offer to expedite their approach into Cali...
One of the AA965 pilots selected a direct course to the Romeo NDB believing it was the Rozo NDB, and upon executing the selection in the FMS permitted a turn of the airplane towards Romeo, without having verified that it was the correct selection and without having first obtained approval of the other pilot, contrary to AA procedures...
The flightcrew had insufficient time to prepare for the approach to Runway 19."
American Airlines Flight 965
Boeing 757-223, N651AA
Near C\slideitem{ Colombia
December 20, 1995 </a>Aeronautica Civil of the Republic of Colombia</a>

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Crew Resource Management
"...Among the results were that captains of more effective crews (who made fewer operational or precedural errors) verbalised a greater number of plans than those of lower performing crews and requested and used more information in making their decisions.
This raises interesting questions about whether situation awareness can be improved by teaching specific communication skills or even proceduralising certain communications that would otherwise remain in the realm of unregulated CRM (crew resource management behaviour)."
Acknowledgement: S. Dekker and J. Orasanu, Automation and Situation Awareness.   In S. Dekker and E. Hollnagel (eds.), Coping with Computers in the Cockpit.  69-85,  Ashgate, Aldershot, 1999. ISBN-0-7546-1147-7.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Crew Resource Management

\slideitem{Cockpit Resource Management:
\slideitem{Cockpit Resource Management:
- crew coordination;
- decision making;
- situation awareness...
\slideitem{More review activities inserted
into standard operating procedures.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}
Human Error and Group Work: Crew Resource Management

<img  width=550 SRC = "../images/crm.gif">
Acknowledgement: C.A. Bowers, E.L. Blickensderfer and B.B. Morgan, Air Traffic Control Team Coordination.  In M.W. Smolensky and E.S. Stein, Human Factors in Air Traffic Control, 215-237, Academic Press, London, 1998.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Crew Resource Management

\slideitem{Cockpit Resource Management:
- based on Foushee and Helmreich.
\slideitem{Group performance determined by:
- process variables - communication;
- input variables - group size/skill.
\slideitem{Goes against image of:
- pilot as "rugged individual";
- showing "the right stuff".

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Crew Resource Management

\slideitem{Key objectives...
\slideitem{ alter individual attitudes to groups;
\slideitem{ improve coordination within crew;
\slideitem{ increase team member effort;
\slideitem{ optimise team composition.
\slideitem{Can we change group norms?
\slideitem{Does it apply beyond aviation?
- with fewer rugged individuals?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Crew Resource Management

FAA Advisory Circular 120-51A 1993
\slideitem{ Briefings are interactive and emphasize the importance of questions, critique, and the offering of
information.

\slideitem{ Crew members speak up and state their information with appropriate persistence until there is
some clear resolution.

\slideitem{ Critique is accepted objectively and non-defensively.

\slideitem{ The effects of stress and fatigue on performance are recognised.
NASA /UT LOS Checklist


\slideitem{When conflicts arise, the crew remain focused on the problem or situation at hand. Crew
members listen actively to ideas and opinions and admit mistakes when wrong, conflict issues are
identified and resolved.

\slideitem{ Crew members verbalize and acknowledge entries to automated systems parameters.

\slideitem{ Cabin crew are included as part of team in briefings, as appropriate, and guidelines are
established for coordination between flight deck and cabin.

Human Factors Group Of The Royal Aeronautical Society.


\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Crew Resource Management

CRM TRAINING METHODS AND PROCESSES

Phase One - Awareness training - 2 days classroom (residential or
non-residential).

Objectives: 
\slideitem{Knowledge: 
\slideitem{ Relevance of CRM to flight safety and the efficient operation of an aircraft 
\slideitem{ How CRM reduces stress and improves working environment 
\slideitem{ Human information processing 
\slideitem{ Theory of human error 
\slideitem{ Physiological effects of stress and fatigue 
\slideitem{ Visual & aural limitations 
\slideitem{ Motivation 
\slideitem{ Cultural differences 
\slideitem{ CRM language and jargon. 
\slideitem{ The CRM development process 
\slideitem{ Roles such as leadership and followership 
\slideitem{ Systems approach to safety and man machine interface and SHEL model 
\slideitem{ Self awareness 
\slideitem{ Personality types 
\slideitem{ Evaluation of CRM 

\slideitem{Skills: 
\slideitem{Nil 

\slideitem{Attitudes: 
\slideitem{ Motivated to observe situations, others' and own behaviour in future. 
\slideitem{ Belief in the value of developing CRM skills. 

\slideitem{ Activities: 
      \slideitem{ Presentations 
      \slideitem{ Analysis of incidents and accidents by case study or video 
      \slideitem{ Discussion groups 
      \slideitem{ Self disclosure 
      \slideitem{ Personality profiling and processing 
      \slideitem{ Physiological experience exercises 
      \slideitem{ Self study 

Human Factors Group Of The Royal Aeronautical Society.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}


Human Error and Group Work: Crew Resource Management

CRM TRAINING METHODS AND PROCESSES

Phase Two - Basic Skills training - 3/4 days classroom residential
Objectives: 
\slideitem{ Knowledge: 
\slideitem{     Perceptions 
\slideitem{     How teams develop 
\slideitem{     Problem solving & decision making processes 
\slideitem{     Behaviours and their differences 
\slideitem{     Thought processes 
\slideitem{     Respect and individual rights 
\slideitem{     Development of attitudes 
\slideitem{     Communications toolkits 
\slideitem{ Skills: 
\slideitem{     See Appendix B 
\slideitem{ Attitudes 
\slideitem{     See Appendix B 
\slideitem{ Activities: 
\slideitem{     Presentations 
\slideitem{     Experiential learning - (Recreating situations and experiences, using feelings to log in
      learning, experimenting in safe environments with cause and effect behaviour exercises) 
\slideitem{     Role play 
\slideitem{     Videod exercises 
\slideitem{     Team exercises 
\slideitem{     Giving & receiving positive and negative criticism 
\slideitem{     Counselling 
\slideitem{     Case studies 
\slideitem{     Discussion groups 
\slideitem{     Social and leisure activities 

Human Factors Group Of The Royal Aeronautical Society.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Crew Resource Management

CRM TRAINING METHODS AND PROCESSES

Classroom, CPT or simulator

Objectives: 
\slideitem{ Development of knowledge, skills and attitudes to required competency standards.

\slideitem{ Activities: 

Practicing one or more skills on a regular basis under instruction in either the classroom, mock up/
CPT facility or full simulator LOFT sessions. Also considered valuable would be coaching by
experienced crews during actual flying operations.

Human Factors Group Of The Royal Aeronautical Society.

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work: Crew Resource Management
"Under normal conditions, aircraft flying is not a very interdependent task.   In many cases, pilots are able to fly their aircraft successfully with relatively little coordination with other crew members, and communication between crew members is rquired primarily during nonroutine situations."
Acknowledgement: C.A. Bowers, E.L. Blickensderfer and B.B. Morgan, Air Traffic Control Team Coordination.  In M.W. Smolensky and E.S. Stein, Human Factors in Air Traffic Control, 215-237, Academic Press, London, 1998.
\slideitem{Does it work in abnormal events?
\slideitem{Additional requirements ignored?
\slideitem{Can it hinder performance?

\slideend{\it{\copyright C.W. Johnson, 1999 - Safety Critical Systems Development}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\slidestart{Terminology}

Human Error and Group Work
\slideitem{Workload.
\slideitem{Situation Awareness.
\slideitem{Crew Resource Management

\slideend{\it{\copyright C.W. Johnson, 1997 - Human Computer Interaction}.}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\end{document}
Hazardous without warning	Very high severity ranking when a potential failure mode affects safe operation or involves non-compliance with a government regulation without warning.	10
Hazardous with warning	Failure affects safe product operation or involves noncompliance with government regulation with warning.	9
Very High	Product is inoperable with loss of primary Function.	8
High	Product is operable, but at reduced level of performance.	7
Moderate	Product is operable, but comfort or convenience item(s) are inoperable.	6
Low	Product is operable, but comfort or convenience item(s) operate at a reduced level of performance.	5
Very Low	Fit & finish or squeak & rattle item does not conform. Most customers notice defect.	4
Minor	Fit & finish or squeak & rattle item does not conform. Average customers notice defect.	3
Very Minor	Fit & finish or squeak & rattle item does not conform. Discriminating customers notice defect.	2
None	No effect	1
1 in 2	10
1 in 3	9
High: Repeated failures	1 in 8	8
High: Repeated failures	1 in 20	7
Moderate: Occasional failures	1 in 80	6
	1 in 400	5
	1 in 2000	4
Low: Relatively few failures	1 in 15,000	3
Low: Relatively few failures	1 in 150,000	2
Remote: Failure is unlikely	1 in 1,500,000	1	Detection	Criteria: Likelihood of Detection by Design Control	Rank
Absolute Uncertainty	Design Control does not detect a potential Cause of failure or subsequent Failure Mode; or there is no Design Control	10
Very Remote	Very remote chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode	9
Remote	Remote chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode	8
Very Low	Very low chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode	7
Low	Low chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode	6
Moderate	Moderate chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode	5
Moderately High	Moderately high chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode	4
High	High chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode	3
Very High	Very high chance the Design Control will detect a potential Cause of failure or subsequent Failure Mode	2
Almost Certain	Design Control will almost certainly detect a potential Cause of failure or subsequent Failure Mode	1