a) Dijkstra argued that 'software testing can be used to demonstrate the presence of bugs, but never to show their absence'. Briefly explain the problems that this observation creates for the development of safety- critical systems.
[Bookwork/seen problem] This is a fairly standard question that we've covered in the class. The use of iteration and conditional statements within programming languages creates exponential behaviors. Even automated testing cannot guarantee to evaluate every possible branch for all possible assignments to process variables. Unlike hardware, the fact that software has performed 'correctly' for a previous period of time provides few guarantees against future behavior. More recent work on, for example the psychology or programming, has also shown systematic weaknesses in software testing strategies. These make it unlikely that programmers and developers will be able to anticipate many of the eventual operating conditions that software will meet in the field.
b) The Federal Aviation Administration recently released Notice 8110.89 'Guidelines for the Approval of Software Changes in Legacy Systems'. This document considers some of the problems that frustrate the use of Black Box testing to establish the reliability of modifications that are made to legacy systems. Give a number of reasons that might have motivated the FAA to issue this guidance.
[Unseen/seen problem] Black box testing is appropriate for legacy systems where subsequent developers may know little about the detailed implementation of previous systems. However, the modification of legacy systems implies that developers must make certain assumptions about the behavior of these systems. These assumptions can be invalid unless programmers have access to original documentation. This documentation is often lost over time or is drafted to development standards that would no longer meet current requirements. The FAA issued this guidance to alert programmers to the dangers of black box testing without access to original design documentation. Even if it is difficult or impossible to reverse engineer the coding details, it is still necessary to access documentation about the operating assumptions that were made during the initial software development.
c) In August 2001, the German Air Traffic Control Authorities (DFS) decided to migrate their existing graphical display software onto a new hardware platform. These display devices provide resolutions that are beyond most commercially available hardware and that are far better than was available to the original programmers of the Controllers' user interface. Briefly explain why it can be difficult to use white box testing techniques in the verification of such safety-critical legacy systems.
[Unseen problem] The transition to a new hardware platform complicates the white-box testing of safety-critical software. The transition to new hardware may involve programmers in writing emulators that support the old code on the new system. It can be difficult to show that the emulators replicate the systems that they replace. There are, typically, economic motivations to exploit the additional resources, such as improved resolution, that are provided by the new hardware. The original test cases are unlikely to stretch the additional functionality provided by the new system. New test cases will have to be identified otherwise there is little point in making the additional investment. Hence, test cases must be refined from the initial version of the software to demonstrate that legacy code together with any necessary modifications will run on a combination of emulated and revised hardware. In user interface development, in particular, synchronization requirements emerge from the need to match display updates to processor speeds to human performance characteristics. Changing any one of these constraints can lead to new test requirements. For example, new hardware may enable updates to be made that far exceed the operators' ability to assimilate the information, which is displayed. Alternatively, increasing the resolution of a monitor can encourage designers to reduce the point size and revise the font selection for critical information. More data can be displayed. Hence, white box testing of the code functionality is unlikely to be sufficient. Further problems stem from the reliability of the documentation that may be available to support the white box testing of legacy systems. As mentioned in part b), this is unlikely to be of the same level and detail required by current development standards. Leveson has argued that this material often describes the method and results of testing without revealing the intention that motivated them etc. 2.
a) The German car manufacturer AUDI AG operate a Linux cluster of 52 Pentium III dual-processor nodes and 24 Pentium 4 processor single nodes to drive car accident simulation software. This architecture is currently being upgraded to include an additional 64 dual nodes based on the Intel Xeon processor. The system includes 57GB RAM, 10 terabytes of hard-disk storage and relatively Fast Ethernet switches (100 Mbit/s). What problems might you expect from the development of accident simulation software to run on this platform?
[Unseen problem] This is a relatively sophisticated, expensive and complex architecture. Software development will require specialist tools and expert programming. In particular, it is important that managers consider the difficulty both of verifying the correctness of results and of validating any proposed changes based on simulated insights. There will also be particular communication problems between experts in road traffic simulation and the programmers who may lack necessary domain expertise. This is particular significant when the results of physical tests have to be compared to the predictions generated by software on specialist platforms.
b) Most recent cars rely upon Controller Area Network (CAN) architectures for power train functions, chassis control and passive safety devices, such as airbags. These systems typically rely upon a single dedicated wiring loom for each application. CAN is not, however, used for primary safety functions, including steering and braking. Suggest reasons why it is not widely used for primary safety functions.
< [Unseen problem] The main reason that CAN is not widely used for primary safety functions is that it relies upon a relatively simple protocol that provides little redundancy. The lack of redundancy starts from the physical level associated with the single dedicated wiring loom. If this fails then there is a total associated failure in the application function. It can be argued, however, that this physical vulnerability could be overcome by multiple dedicated wiring. However, this would increase costs to a prohibitive level. Further problems also stem from the complexity of existing CAN systems without the addition of multiple redundant looms. It might be better to have a form of shared bus with redundancy and more complex arbitration techniques. Some attempts have been made to improve on CAN by building fault tolerant variants. However, these begin to resemble the more complex bus protocols mentioned below
c) The FlexRay system is being developed by BMW, DaimlerChrysler, Motorola and Philips Semiconductors. FlexRay is a standard for the development of safety-critical buses to support drive- by-wire, adaptive cruise control, collision avoidance and active suspension. FlexRay is based on what is known as a Time-Triggered Protocol. Critical applications are guaranteed access to the bus at predefined intervals, this is intended to support determinism for safety-critical functions. In addition to these static messages there are also ad hoc dynamic message segments. A global clock is used to synchronise this access to a shared bus. A "bus guardian" is used to prevent contention or the flooding from pathological processes. Two channels are available and a scheduler is used to ensure that important messages are never blocked by less important signals. Use these components of the FlexRay architecture to devise a fault tolerant 2 out of 3 voting scheme for automotive applications.
[Seen/Unseen problem] There are many different approaches. Assume that we have three units, 1, 2 and 3, engaged in the voting. Initially, they provide inputs to the application process S1, S2 and S3. These are placed onto both of the redundant channels. Each unit can observe the inputs being used by each of its peers. The results of the computation, A1, A2 and A3 are similarly put out onto the channels. These can be broadcast or may be addressed, for example to 1, 2 and 3, depending on application requirements. The exact mechanisms used to resolve discrepancies between A1-3 and S1-3 depend on the nature of the application. Notice that we use an input confirmation stage to ensure a form of checkpoint consistency prior to computation. This is not essential in the answer but provides further discussion in an ideal solution.
Another key feature of an ideal solution would be to base the voting protocol around statically scheduled timeslots. Ideally, these would occur within a single frame or segment as illustrated below. If the result transmission were not broadcast within a specified timeout then liveness could not be guaranteed.
3. a) In 1997, Driver Reminder Appliances were introduced across the UK railways. These are intended to ensure that drivers do not start their trains when the signals are at 'red'. Drivers must remember to manually set Passive DRA systems by pushing down a button on top of their control system after the train comes to a halt. When set, the button illuminates red and disables the traction power. Pulling out the button resets the device and allows the driver to proceed. The idea is that the light and the action remind the driver to check that the signal allows them to continue. Briefly assess the effectiveness of this techniques as a means of ensuring that drivers do not proceed beyond signals at 'red'.
[Bookwork/Unseen problem] The key problem with passive DRA systems is that the driver must remember to set it whenever they come to a stop in front of a signal. The same problems of fatigue, distraction and inattention that can prevent drivers from observing the aspect of a signal can also prevent them from remembering to use the DRA. There is no mechanisms that will actually prevent a driver from moving the train should the signal be at red and the DRA not be applied. b) There are a number of situations in which it is dangerous to use passive DRA systems. For example, every driver is supposed to set the system when they enter the cab. This is intended to ensure that they check the signal before they leave the station. However, some drivers have set the DRA as they leave the cab. Hence, new drivers entering the cab may lack the intended reminder of setting the system. Further hazards can arise if drivers attempt to set the DRA while the train is in motion; the system can be abused as a form of braking mechanism. Procedures are specified in the drivers' rule book to guard against these problems. Briefly explain why rule violations and errors can result in drivers breaking these guidelines.
[Unseen problem] A violation occurs when operators deliberately break a rule or procedure. The use of the DRA in motion can provide an example of the deliberate misuse of this system. However, it may be that the drivers that used the device in this way were unaware of the prohibition. In this case, the same behavior might be reclassified as a more genuine form of error. This illustrates a key problem in assessing the human involvement in adverse events because one needs to infer intention in order to distinguish errors from violations. The setting of the system when drivers exit the cab might be seen to be a violation of operational instructions. However, from the drivers' point of view it is reasonable to argue that they did this as an additional safety measure. The next driver would deliberately have to switch off the DRA before continuing with the journey. In this case it can be argued that the procedure was itself flawed. Hence, further problems arise in criticizing users who violate rules that can themselves be questioned. c) QinetiQ were recently commissioned to examine the reasons why DRA systems fail to prevent drivers from starting when signals are at 'red'. In an initial questionnaire, 99% of drivers stated that they used the DRA according to the specified guidelines. However, only 30% of the forms were returned and doubts have been expressed about the reliability of the 99% response. Briefly describe how you would go about gaining additional insights into the reasons why DRA devices are not having their intended effect.
[Seen/Unseen problem] This is an open-ended question. The QinetiQ study used a combination of focus groups, questionnaires, observational studies and interviews. I'm interested in whether the solution illustrates some understanding of the term 'safety culture'. It's a nebulous concept that we've addressed in the course. It refers to a willingness to implement new working practices and to exploit new technologies to improve safety. It also reflects a willingness to report on and criticize existing safety performance. A key idea is that a poor safety culture can undermine or complicate the elicitation techniques mentioned above. This might be one explanation of the relative response rates for the questionnaire. Observational studies will not succeed if operators significantly alter their behavior during the observations. Similarly, focus groups and interviews will founder under the hostility or opposition of operators and managers. Poor safety culture not only frustrates the adoption of innovations but can also prevent developers from understanding precisely why innovations fail to deliver their intended benefits. We've also spent some time in the course looking at risk homeostasis. I can see that some students might argue that the operator behaviors resemble those of car drivers extracting performance improvements from safety devices in the Gerald Wilde studies. This is not the focus of the question but I would give credit for such answers.
4. Most recent Safety-Critical Software standards, including IEC 61508, exploit a risk equation, which is defined in terms of the product of the consequence and likelihood of hazards. What are the strengths and weaknesses of this approach to risk assessment?
[Essay] This is an essay style question that is intended to give plenty of scope for first-class answers. There are many different approaches. For instance, there are huge problems in assessing both the likelihood and the consequence of particular hazards. In particular, it can be difficult to assess the probability of both software failure and operator error. In the lectures we have covered John Musa's formulae for the computation of software failure rates. We have also analyzed the difficulties of placing confidence estimates on the results of these formulae. The lectures have also addressed a series of technical issues surrounding the computation of reliability rates in the face of dependent systemic failures, for example induced by poor safety culture or by inadequate maintenance regimes. Standard reliability calculations, for instance using the minimal cutsets of fault trees, cannot be sustained in the face of dependent failures.
In terms of consequence assessment, it is always hard to identify the plausible worst consequences from any adverse event. This task is complicated by the lack of any easily defined metrics. Monetary loss provides one yardstick but this often fails when considering the value of human life or environmental damage. There are a series of well-known paradoxes, including the Allais paradox, that demonstrate our fallibility in making judgements that are based on standard utility curves. Hence even if we could agree on a suitable metric it is unlikely that the value associated with any metric would be monotonic. Such complexities are not, typically, considered within standard forms of the risk equation. These are not simply theoretical objections. Psychometric studies by Fischoff and Slovic have pointed out that some risks have a 'dread' factor that goes well beyond any rational calculation of their actual consequences. This might be the case for instance with an incident involving a radioactive release.
In the lectures we have covered the difficulties of validating numeric estimates of risk. For low probability failures, it may not be feasible to use exhaustive testing. For high consequence failures, there are ethical objections to any form of simulation that might lead to an actual incident. Accident and incident statistics are subject to reporting biases etc.
Further lines of analysis might pursue the difficulty of establishing inter-analyst agreement using the simplified risk equation. Even when broad categories of likelihood and consequence are used rather than precise numeric judgement, it is rare to establish consensus at any level of detail. The engineering process often depends more on ad hoc arbitration and subjective judgement than it does upon a well defined scientific method.