Xday, XX May 2000.

9.30 am - 11.15am

### University of Glasgow

#### SAFETY-CRITICAL SYSTEMS DEVELOPMENT

Answer 3 of the 4 questions.

1.

a) Failure Modes, Effects and Criticality Analysis (FMECA) defines a risk priority number to be the product of a severity index, an occurrence index and a detection index. Briefly explain why it is important to consider each of these terms during any assessment of failure modes.

[3 marks]

b) Identify three major problems that can prevent the use of risk priority numbers being applied to modes that relate to software failures.

[5 marks]

c) John Musa’s work at Bell labs led to the definition of the following equation:

lambda_0 = K x P x W_0

where:

• lambda_0 is a failure rate for software systems

• k is a constant that accounts for the dynamic structure of the program and the varying machines (e.g., k = 4.2E-7).
• p is an estimate of the number of executions per time unit (ie, p = r/SLOC/ER).
• r is an average instruction execution rate, determined from the manufacturer or benchmarking and is a constant value.
• SLOC is the number of source lines of code (not including reused code).
• ER is an Expansion ratio. It is a constant that reflects properties of particular programming languages (e.g., Assembler, 1.0; Macro Assembler, 1.5; C, 2.5; COBAL, FORTRAN, 3; Ada, 4.5).
• W_0 is an estimate of the initial number of faults in the program. This can be calculated using: w0 = N x B or a default of 6 faults/1000 SLOC can be assumed.
• N is the total number of inherent faults. This is an estimate based on judgement or past experience.
• B is the fault to failure conversion rate; that is the proportion of faults that become failures. Proportion of faults not corrected before the product is delivered. Assume B = .95; i.e., 95% of the faults undetected at delivery become failures after delivery.

Briefly explain why the terms in the Musa formula were originally thought to provide a good indication of software reliability.

[12 marks]

2.

(a) Briefly define what is meant by the term "situation awareness".

[3 marks]

b) Why is situation awareness likely to be a significant problem for safety-critical applications such as Air Traffic Control.

[5 marks]

c) The following diagram presents Wickens and Flach’s model of information processing.

Use this model to explain the following excerpt from the FAA’s Aviation Safety Reporting System.

"Late night training flight...We were going out to make a 180 degree turn and land. The aircraft is equipped with an Enhanced Ground Proximity Warning System. I vectored the student on a modified procedure turn. I put my head down to get some reference data and heard the ground proximity warning, "Caution, terrain." I took over the controls and performed our escape manoeuvre and gave the jet back to the student. The student allowed the jet to descend again while my head was down. Again the ground proximity [warning] went off. I did our escape manoeuvre again and flew the aeroplane to the final approach course and let the student land. There were only 3 of us on board. Another student was in the jump seat. I asked them if they saw the terrain on the enhanced display and they said yes. They thought I would tell them when to turn… I should not have looked away while in that phase of flight with new students unfamiliar with the area." (ASRS Callback, Issue 237 March 1999)

[12 marks]

3. (a) Briefly distinguish between permanent, transient and intermittent faults.

[3 marks]

(b) Draw a diagram to illustrate the main features of triple modular redundancy. Use this diagram to explain how a multilevel, triple modular redundancy architecture can be used to improve the reliability of safety critical hardware.

[5 marks]

(c) Use the following excerpt to briefly explain the main features that contribute to the reliability of the US Space Shuttle General Purpose Computer (GPC) architecture:

"Each computer in a redundant set operates in synchronized steps and cross-checks results of processing about 440 times per second. Synchronization refers to the software scheme used to ensure simultaneous inter-computer communications of necessary GPC status information among the primary avionics computers. If a GPC operating in a redundant set fails to meet two redundant synchronization codes in a row, the remaining computers will vote it out of the redundant set. Or if a GPC has a problem with its multiplexer interface adapter receiver during two successive reads of response data and does not receive any data while the other members of the redundant set do not receive the data, they in turn will vote the GPC out of the set. A failed GPC is halted as soon as possible."

(NASA Shuttle Public Technical Reference Material)

[12 marks]

1. With reference to IEC61508 or DO-178B, argue for or against the statement that standards are both a necessary and a sufficient prerequisite for the development of safety-critical computer systems.

[20 marks]

[end]