Copyright Chris Johnson, 2002. I will provide sample solutions for this paper if you send in an attempted solution.
Xday, XX May 2002.

9.30 am - 11.15am



University of Glasgow





DEGREES OF BEng, BSc, MA, MA (SOCIAL SCIENCES).





COMPUTING SCIENCE - SINGLE AND COMBINED HONOURS
ELECTRONIC AND SOFTWARE ENGINEERING - HONOURS
SOFTWARE ENGINEERING - HONOURS





SAFETY-CRITICAL SYSTEMS DEVELOPMENT





Answer 3 of the 4 questions.

1.

a) Briefly explain why the UK Defence Standard 00-55 requires that Worst Case Execution Times and the amount of memory required by safety-critical software should be statically determined.

[3 marks]

b) Most modern processors provide cache memory that a memory manager can exploit for maximum throughput. Briefly explain why this creates particular problems for the safety-cases associated with software projects and describe two possible solutions to this problem.

[7 marks]

c) The Boeing 777 Primary Flight Control System uses different types of processors in each of three computing channels with cross-lane monitoring between each channel. Explain why this architecture can be used to mitigate processor design errors and attempt to ensure liveness.

[10 marks]

 

2.

a) NASA use Failure Modes, Effects and Criticality analysis to guide their risk assessments. Level 1 hazards are associated with the highest level of criticality. They are sufficient to cause overall shuttle failure defined by loss of vehicle and crew (LOVC). Early Shuttle risk assessments focussed on the probability of a level 1 failure occurring to individual component subsystems. Briefly explain why this systematically under-estimates the likelihood of LOVC incidents.

[3 marks]

b) Since the loss of the Challenger mission, Shuttle risk assessments have assumed that all level 1 failures will lead to LOVC incidents. Describe the problems that this might create for the subsequent engineering of Shuttle subsystems.

[5 marks]

c) In 1995, a NASA investigation FMECA found that the probability of a LOVC was between 1 in 76 and 1 in 230 missions. These estimates considered a range of hazards, including the loss of a tile from the thermal protection system on the outer skin of the Shuttle. The analysis shows that 15% of the tiles are the source of 85% of the risk to the heat shield. Possible failure modes include a failure to centre the tile in its cavity during maintenance operations and letting the bond dry before applying pressure. Explain the problems of calculating Risk Priority Numbers given this data. Comment on the accuracy of any risk assessment for LOVC incidents that rely on likelihood estimates for these contributory factors.

[12 marks]

 

3. a) The Dornier/CMIL Surgical Programmable Urological Device is a robotic system that supports a range of surgical procedures including laser based incisions and the insertion of radioactive sources (seeds) into a patient. The software architecture of this system is composed of four layers. These can be summarised as follows:

  1. The graphical user interface;
  2. Ultrasound image detection, boundary detection and 3D modelling of the patient;
  3. Treatment planning and robot controller interface;
  4. Drivers for communication between the processor and a programmable multi-axis controller.
Explain how risk assessment techniques might be used to guide white box testing to ensure the safety of these software components.

[3 marks]

b) Both white box and black box testing provide examples of dynamic verification techniques. What are the limitations of this approach compared to static analysis techniques? What additional issues must be considered when using these techniques to support the development of a safety-critical system rather than mass-market software?

[5 marks]

c) The software for the Dornier/CMIL Surgical Programmable Urological Device is implemented in ANSI C++ and runs under Linux (Red Hat 6.1). Write a brief technical report that explains the benefits of these implementation platforms and then summarises any concerns that you might have about producing such a safety-critical application using these environments.

[12 marks]

 

4. Both Perrow and Sagan have identified tight coupling as both a key strength and a major weakness in modern safety-critical systems. How can this concept be applied to explain the software failures that occurred during either NASA~Rs Mars Surveyor'98 missions or the loss of Ariane 5's flight 501?

[20 marks]

 

 

[end]