Software Safety in a Nutshell

Clifton A. Ericson II

The Boeing Company; Seattle, Washington

The purpose of this article is to present a quick overview of software safety or software safety in a nutshell. The intent here is not to directly solve your software problems, but to raise your level of awareness and understanding in regard to software safety This article expounds upon the poignant points of software safety and might also be called the A, B C’s of software safety

Software is influencing and controlling more and more of our daily lives.

The technology race (or rage) combined with the small inexpensive microprocessor has made it such that our society is giving computer control to everything possible. Everything from toasters to medical equipment, from commercial aircraft lavatories to spacecraft, and from kids toys to nuclear weapon systems. And, as more control is being given to computers, it follows that the software driving these computers is more prevalent and controlling also.

Software is posing increasing safety risk.

The increased level of risk in software is due to many interrelated and complex factors, such as:

Increased usage of computers / software.
Increased dependency on computers / software..
Increased application to safety critical uses.
Significant difficulty developing software.
Significant difficulty preventing software bugs and errors.

Software has many unique characteristics that make it difficult to work with.

The very unique nature of software makes it difficult to completely understand, and even more difficult to visualize all the possible ways software can perform or fail to perform.

Some of the unique characteristics of software include:

Software is an abstract concept in that it is a set of instructions on a piece of paper or in computer memory. It can be torn apart and analyzed in piece parts like hardware, yet unlike hardware it is not a physical entity with physical characteristics which must comply with the laws of nature (i.e., physics and chemistry).
Since software is not a physical entity it does not wear out or degrade over time. This means that software does not have any failure modes per se. Once developed it always works the same without variation
Unlike hardware, once a software program is developed it can be duplicated or manufactured into many copies without any manufacturing variations.
Software is much easier to change than is hardware. For this reason many system fixes are made by modifying the software rather than the hardware.
There are no standard parts in software as there are with hardware. Therefore there are no high reliability software modules, and no industry alerts on poor quality software items.
If software has anything which even resembles a failure mode, it is in the area of hardware induced failures.
Hardware reliability prediction is based upon random failures, whereas software reliability prediction is based upon the theory that predestined errors exist in the software program.
Hardware reliability modeling is well established, however, there is no uniform, accurate or practical approach to predicting and measuring software reliability.
Since software does not have any failure modes, a software problem is referred to as a software error. A software error is defined as a situation when the software does not perform to specifications or as reasonably expected, that is when it performs unintended functions. This definition is fairly consistent with that of a hardware failure, except that the mechanisms or causes of failure are very different.
Hardware primarily fails due to physical or chemical mechanisms and seldom fails due to human failure mechanisms (e.g., documentation errors, coding errors, specification oversights), whereas just the opposite is true with software.
Software has many more failure paths than hardware, making it difficult to test all paths.
By itself software can do nothing and is not hazardous. Software must be combined with hardware in order to do anything.

Designed software is the weak link.

The weak link in software is the final designed product, not the design process. Regardless of the design process, hazards will always be unintentionally built into the design (experience bears this truth out). When implemented, software almost always works as intended, or close to it. The problem is, it also has unforeseen functions built-in that can perform in unintended, undesired and hazardous ways.

As a system is designed with intended functions, it is also inadvertently designed with built-in unintended functions, many of which may be hazardous. You are probably familiar with the famous drawing of the old woman that is very easy to see in the picture. But, also within the same drawing is the picture of a beautiful young lady, which takes more time and effort to find in the picture. So, within the same drawing exists both an intentional picture and an unintentional picture. This is perhaps an over simplified view of software design.

Finding and eliminating the built-in unintended and undesired hazardous functions is the ultimate goal of software safety. This means attacking the designed product. Many built-in unintended hazardous functions (BUHF) can be avoided through the design process, but no matter what the process, a few BUHF’s will always be created and survive to live within the final product. History has shown this to be true with hardware designs.

The first key to software safety is the hazards, or unintentional design.

Since safety problems are caused by the BUHF’s in systems, it only makes sense to focus on hazards to make a system safe. Of course this is not a new concept, it was discovered 40 years ago with hardware controlled systems.

Now, focusing on hazards is easier to say than to do. It is not easy to visualize or foresee hazards within a software design. Particularly when the hazards involve subtle features of the combined hardware, software, man machine interface and the environment.

It should be noted that hazard analysis is not an exact science, and still needs considerable improvement.

There are many ways to identify hazards, some of the most current include:

Hazard analysis.
Specification and code correctness.
Software models.
Software testing.
System testing.
Design tools.

Software has a subtle nature which can make the safety analysis task more difficult than normal. For example, software can have errors and still function reasonably well, particularly without causing a safety problem. Software errors may not always be readily apparent, they may be lurking in the woodwork or slowly causing a hardware element to build up to an unacceptable tolerance level. Software errors are usually application and input dependent, that is, when software is used for applications and inputs for which it has not been tested, errors begin to occur more frequently. Not all software errors cause safety problems and not all software which functions according to specification is safe

The following taxonomy briefly describes typical software hazard types and is a very useful aid in performing a software safety analysis. The generalized hazard categories are as follows:

Inherent Hazards - A software controlled function which is inherently hazardous due to the hazardous nature of the equipment or process being controlled, such as hazardous materials or energy sources.
Timing Hazards - Software controlled functions where the timing sequences are safety critical. This is often an overlooked area because many times sequences are taken for granted to be safe, until an accident/incident occurs.
Induced Hazards - A software hazard caused by a computer hardware failure which causes a bit error, resulting in an erroneous instruction. For example, an intended word "1101" meaning "add to register A" may be changed by an induced bit error to "1011" which means "subtract from register A."
Latent Hazards - A hidden condition in the software design which is not hazardous until a particular unplanned or untested set of circumstances occur.

Common goals of both software design and software safety analysis should include the following:

The software utilizes the full capability of the hardware.
The software properly utilizes the hardware capability.
The software does not over-stress the hardware.
Software anticipates safety critical hardware failure modes with workarounds.
The hardware impact due to erroneous, premature, or no interface signal is considered, where interface signals are defined as data or commands between the computer and functioning hardware elements.

The second key to software safety is the system.

As first identified in reference 1, software safety is a system issue. Software cannot be entirely removed from the system and analyzed in a sterile environment. By and of itself, software is not hazardous [ref. 1]. Software is only hazardous when operating or controlling hardware. This provides another key to software safety focus on the safety critical hardware and hazardous system operations and modes.

Software always works as designed, but not necessarily as intended. Software is totally designed to a specification and it eventually is made to achieve what the design specification requires. However, the system nature yields a software design which performs some unforeseen, unintended, and undesirable functions (ie, BUHF). These are the problems of interest to system safety.

Software has a greater capability than intended or specified. Software usually does not stop functioning when an error occurs, it merely continues to operate in an unanticipated manner. Therefore, the known design of software is a subset of the total design, which includes all of the possible outcomes software could achieve as a result of an error or hardware induced failure. The real exercise in software safety analysis is determining the extent of the total software capability

The third key to software safety is the design process.

The various tools, techniques and methods for software design can also contribute to designing a safe system. The use of design standards, design guidelines and historical data help eliminate known and already experienced problems.

Common high level design goals include:

Fault Avoidance - Design to avoid the occurrence of software hazards.
Fault Warning - Design to detect conditions which could be hazardous and provide operator warning in order that the operator can take appropriate corrective action.
Fault Correction - Design for fault detection but also provide automatic means for self-correction.
Fault Tolerance - Design for fault detection but also provide alternate paths which are automatically selected.
Fail Operational - Design such that when a single failure or error occurs the system fails operational (and safe). It should be noted that safety may have an extra burden trying to ensure that the system is also safe in this situation.
Fail Safe - Design such that when two independent failures or errors occur the system fails safe (but not necessarily operational).
Isolation Safety - Design such that when a control signal is being monitored an isolation circuit is provided, and the isolation circuit will not induce an error in the control signal even when a single failure occurs.
Software partitions.
Software safety kernel.

REFERENCES

[1] C. A. Ericson II, Software and System Safety, 5^th International System Safety Conference, 1981.

[2] C. A. Ericson II, Software Safety Precepts, 14^th International System Safety Conference, 1996.

BIOGRAPHY

Clifton A. Ericson II

The Boeing Company

18247 150th Ave SE

Renton, WA 98058 USA

phone 253-657-5245

fax 253-657-2585

email clifton.a.ericson@boeing.com

Mr. Ericson works in system safety on the Boeing 767 AWACS program. He has 33 years experience in system safety and software design with the Boeing Company. He has been involved in all aspects of fault tree development since 1965, including analysis, computation, multi-phase simulation, plotting, documentation, training and programming. He has performed Fault Tree Analysis on Minuteman, SRAM, ALCM, Apollo, Morgantown Personal Rapid Transit, B-1 and 737/757/767 systems. He is the developer of the MPTREE, SAF and FTAB fault tree computer programs. In 1975 he helped start the software safety discipline, and has written papers on software safety and taught software safety at the University of Washington. Mr. Ericson holds a BSEE from the University of Washington and an MBA from Seattle University.