Copyright Chris Johnson, 1997.
In order to reduce download times, all of the images and illustrations for th is course are included in the Lecture notes.

Social Aspects of Computing

Chris Johnson

This course forms part of a larger introduction to Human Computer Interaction. In addition to Social Aspects of Computing, we also consider Interactive Systems Design.

This course is intended to introduce people to the wider social implications of computing technology. There are two main objectives:

to introduce some key technologies that will have a profound impact upon the way that we live.
to emphasise the complex social consequences that these technologies will have upon the way that we live.

Four topics are used to illustrate these points.

the Internet is used to introduce the social implications of a world population that is either 'information rich' or 'information poor' ('computer literate' or 'computer illiterate'?).
Mobile devices and Computer Supported Cooperative Work are used to introduce the social implications of remote, teleworking and of the team based use of computers to coordinate group activities.
Security is used to introduce the technological and social threats posed by our reliance upon heavily interconnected systems.
Safety is used to introduce the consequences of our increased reliance upon fallible programmers and their products.

This list is not exhaustive, many other topics could have been chosen. For example. we do not discuss the ACM's recent attempts to draft a code of ethics for software engineers. However, the intention is start a debate that can be continued and expanded upon in tutorials.

These notes supplement the briefer bullet points that structure the lecture material (see Course Index).

What is the Internet?

Early in 1962, the Rand corporation, one of America's leading military suppliers, became concerned about how people would communicate after a nuclear holocaust. The solution was to grow into the Internet - a highly connected network of computer systems. Since the inception of the Internet, there has been a rapid growth in world-wide computer networks. In 1971, there were twenty-three host machines. In 1980 there were approximately one hundred computers attached to the Internet. In 1990 there were one hundred thousand. In 1994, the number of systems connected to the Internet exceeded one million. A recent estimate placed the number of Internet users at just over twenty-five million.

Hundreds of sites in many different domains provide access to a vast range of information sources. The growth of these information sources and the development of applications such as Netscape and Mosaic has encouraged the active participation of new groups of users. Most of these participants only possess a minimal knowledge of the communications mechanisms that support computer networks.

The fact that most of the people using the Net only have a minimal idea of the underlying technology creates quite a few problems. For example, I once heard a teacher say to her class that the delays in retrieving information over the web were 'a bug in the system'. She was accessing a site in Australia. That's a bit like saying that gravity is a bug in physics. Unless you understand the ways in which computer networks operate then you can't make the best use of them. For example, most people try to restrict their web access to the early morning in Glasgow. This is because many remote sites become saturated as the mass of American users come on-line.

Why is the Internet important?

There is a huge amount of hype over the Internet. This is largely due to two things: the growth of the world wide web and the increasing use of electronic mail systems.

The growth in electronic mail was largely restricted to academic communities, i.e. Universities and colleges, until the late 1980s. It then became increasingly common for companies to develop internal mail systems. These were typically based around proprietary systems that were sold as part of a PC networking package. Most large businesses could not see the point of hooking up to the Internet and so addresses were only valid within that local area network. Concerns over Internet security also encouraged businesses to isolate their users accounts from the outside world. However, things are changing. The ability to rapidly transfer information using systems such as Netscape and Mosaic has encouraged more companies to extend their e-mail access. Indeed, Netscape now includes mail handling facilities. Other groups of users have been encouraged to use electronic mail by the growth of an application called LotusNotes - the popularity of this systems was one of the reasons cited for IBM's decision to buy the Lotus company. Similarly, there have been persistent rumours that the growth of the World Wide Web is attracting Microsoft to launch hostile bids for the Netscape Corporation.

The World Wide Web grew from the National Centre for Supercomputer Applications (NCSA), University of Illinois and from CERN, a European Research Centre for nuclear physics, where there were concerted efforts to improve the means of passing files over remote networks. In both cases, they were groups of potential users rather than people who would regard themselves as primarily `interface designers'. The work at the NCSA led to the development of the Mosaic in 1993. Mosaic was a free program that did much to attract the initial user group to the Web. Netscape was then developed as a commercial successor once there were enough users for the web to be successful. Nobody would pay money for a browser until there was enough information on the web.

Mail

Mail is a simple idea. Files are sent across a computer network to a destination machines. The system then alerts the intended recipient who may then read the file.

Addressing and Routing

Each message needs an address. My full address is johnson@dcs.gla.ac.uk. This means johnson's account in the Department of Computer Science, dcs, within Glasgow University, gla, which is an academic institution, ac, in the UK, uk. There are many variations on this. For example, eric@cict.fr is the address of an account in the University of Toulouse, cict, in France, fr. I leave you to work out who Bill.Gates@microsoft.com is. If you are sending messages to me from within the department then you can drop the location. In other words, johnson will work and the dcs.gla.ac.uk is understood from the context. If you were to mail me from a machine in the Psychology department then johnson@dcs should work. The routing system will again figure out the gla.ac.uk bit for itself.

Mail groups

Mail groups enable large numbers of users to be contacted at the same address. For example, if I send a mail message to HCI1-tutors this will be expanded to adam, alanr, clarkesj, cordyh, fross, johnson, mark and so on. Clearly, it would be extremely irritating if I had to type in all of these user names individually. There are also mailing lists maintained by dedicated machines. These are used by Governments and commercial organisations to support special interest groups. For example, the British Computer Society has a Human Computer Interaction special interest group. Almost all of its 2-3000 members are on the same mail list. they regularly see items about jobs in HCI or forthcoming conferences sent to them via this mailing list.

Filtering and Uncertainty

There is an increasing need to filter out the rubbish from the relevant messages. The latest version of the Eudora tool that you use supports this. You can specify that mail from a particular user is instantly thrown away. Alternatively, you might specify that mail with the word joke should be put at the bottom of the list. Finally, it is worth mentioning that email is altering the way that many firms are doing business. The increasing use of MIME attachments enables users to include graphics, sounds and films along with their textual messages. This provides substantial benefits to the conventional fax machine where the user would have to scan in any document to get it back into a digital form. Some things haven't improved; `our mail system is down so I cant do anything...'.

News.

Before web browsers revolutionised the way that information was exchanged over the Internet, the primary means of communicating ideas to groups of people was by News. This system broadcasts a single file to all of the registered newsservers on the Network. Users on any site could then `subscribe' to a newsgroup and read messages as they arrived at their local site. This is different to mail where users have to specify an account as the recipient of a message. Here the sender may not know all of the people who are registered to read the news.

Updates

The actual mechanism by which news percolates through the Internet is very complex. If you post a news article from Glasgow then every other site in the world must eventually get to know that they need a copy of your file. This is done through a form of vector. For example, assume that there were three news groups . If Strathclyde currently had seen three articles about Human Factors, four about Software Engineering and one about Ada, their vector would look like <3,4,1>. Meanwhile, Supposing you now post an article about Ada, the Glasgow vector would now be <3,4,2>. If the two sites compared their records, Strathclyde would now know to request your article in order to be up-to-date. Now imagine this with hundreds of newsgroups on a global scale....

Threads

Just as mail systems have grown to provide filtering mechanisms, most newsreaders now support threads. These are chains of similar messages. Supposing we wanted to start a discussion about Computing Fundamentals. As more and more people from all over the world, added their contributions to the discussion they would all have the same subject. Newsreaders can detect this and would offer readers the chance to read all of the contributions on this subject at the same time or to delete them all. Again, this offers a mechanism for filtering the mass of articles that might be posted to a group.

Netiquette

The Internet is self-regulating. In other words, every member of the Internet community is responsible for their behaviour. The infrastructure, the academic and governmental networks, are supported by public money. There is a good chance that our current freedom will be lost if pornography or violent material are introduced onto these networks. The term 'netiquette' refers to the unwritten rules that govern good behaviour on the Internet. For example, pyramid selling schemes are highly disapproved of. If anyone is in doubt about what is or what is not acceptable then send me some email.

The Web Architecture

What is the World Wide Web?

The World Wide Web is an Internet client-server hypertext distributed information retrieval system which originated from the CERN High-Energy Physics laboratories in Geneva, Switzerland. Application or client programs, called browsers, translate user request for information into the communications primitives that are necessary to transfer relevant data from remote servers.

An extensive user community has developed on the Web since its public introduction in 1991. In the early 1990s, the developers at CERN spread word of the Web's capabilities to scientific audiences worldwide. By September 1993, the share of Web traffic traversing the NSFNET Internet backbone reached 75 gigabytes per month or one percent. By July 1994 it was one terabyte per month.

On the WWW everything (documents, menus, indices) is represented to the user as a hypertext object in HTML format. Hypertext links refer to other documents by their URLs. These can refer to local or remote resources accessible via FTP, Gopher, Telnet or news, as well as those available via the http protocol used to transfer hypertext documents.

The client program (known as a browser), e.g. NCSA Mosaic, Netscape Navigator, runs on the user's computer and provides two basic navigation operations: to follow a link or to send a query to a server. A variety of client and server software is freely available.

Most clients and servers also support "forms" which allow the user to enter arbitrary text as well as selecting options from customisable menus and on/off switches.

Following the widespread availability of web browsers and servers, many companies from about 1995 realised they could use the same software and protocols on their own private internal TCP/IP networks giving rise to the term "intranet".

The Netscape Interface

The importance of the World Wide Web cannot be underemphasised. Everywhere you look, you can see URLs (this stands for Uniform Resource Locator). The URL of my home page is http://www.dcs.gla.ac.uk/~johnson. The http stands for hypertext transfer protocol; this is the low-level communications mechanism that is used to transport the file. The www.dcs.gla.ac.uk refers to the address of our web server and is similar to our email address. The ~johnson is the name of a file on that web server.

Supply Push

It's worth pausing to consider why the web has become so popular. The decision to freely supply both web browsers and servers meant that users of earlier Internet tools, such as Gopher and ftp, could try the new systems. An important point here is that Mosaic could still be used to access files that had previously been retrieved using the previous systems. This `backwards compatibility' encouraged people to explore HTML - hypertext markup language, as a means of gaining the greatest benefits from the new browsers.

Demand Pull

Just as the supply of the new technology pushed people to use the web, there was a pull from the material that began to appear on the Internet. New users were attracted to the system because they could use it to access advertising material, course notes, shopping services, games and so on. Many of these users had never thought about accessing the Internet before. All of this helps to set up a cycle. PC manufacturers began to include modems and connection rentals in the price of new machines. by supplying these services, they helped to provide more people with the ability to place information on the web. This, in turn, created demand from those who did not have access to the web to join the expansion.

Interface Design

The expansion of the web is entirely due to the successful interface design techniques that were incorporated into the browsers. Unless people could use these systems to retrieve the information that they wanted then the whole system would have remained a relatively small scale application, developed by nuclear physicists...

Java

If you compile a program on a Mac then it can't be run on a PC. If you compile a program on a PC then it wouldn't run on a UNIX machine. This is because the binary file for an application that runs on one platform cannot run on another platform, because the binary file is machine- specific.

The Java Platform breaks these rules by generating a device independent byte code when you compile a program. Every other machine then uses another program to interpret that code into a format that a specific machine can understand. Each underlying platform must, therefore, have its own implementation of Java. Because of this, the Java Platform can provide a standard, uniform programming interface to applets and applications on any hardware. Sun, therefore, claim that ``The Java Platform is therefore ideal for the Internet, where one program should be capable of running on any computer in the world. The Java Platform is designed to provide this "Write Once, Run Anywhere" capability'' (see http://www.java.sun.com).

Applets

The Java Platform enables developers to create two different kinds of programs: applets and applications. Applets are programs that require a browser to run. The applet tag is embedded in a Web page and names the program to be run. When that page is accessed by a user, either over the Internet or corporate intranet, the applet automatically downloads from the server and runs on the client machine. Because applets are downloaded, they tend to be designed small or modular, to avoid large download times.

Applications

Applications are programs that do not require a browser to run-they have no built-in downloading mechanism. When an application is called, it runs. In this way, applications are just like programs in other languages. They can perform traditional desktop tasks, such as that done with a word processor, spreadsheet or graphics application. Like an applet, an application requires the Java Platform for it to run; however the Platform can be available as a separate program, can be embedded directly within the underlying operating system, or presumably can be embedded in the application itself.

Exercise: Paper Chase on the Web.

The URL for the British Human computer Interaction Group is: http://kmi.open.ac.uk/~simonb/bcs-hci/hci-grp.html. Try access the SEL-HPC HCI Document Archive link and from there look for any information that you can find about the web. One approach would be to simply issue a query on the word Web in the form at the bottom of the page. You may even find an article by myself, if you're very unlucky.

Next, access the following URL http://www.jaguarvehicles.com/uk. Notice how the site offers three different sorts of web pages for different levels of browser. There are many different reasons for this - the main one is that some machines, such as the Boyd Orr Macs, don't have enough memory to run Java enabled browsers, There are also communications overheads. If you are paying for a dial-up phone access over a slow modem, you can probably live without full colour graphics and animations.

Finally, take a look at http://wwwvoice.com/bud/bud.html. This is a joke site but take time to look at the mistakes that the designers introduced into the web site. How many things might you criticise the design for?

Computers as Tools

Winograd and Flores.

In 1987, Winograd and Flores published a book entitled Understanding Computers And Cognition (Addison-Wesley, Reading, United States of America), They introduce the argument that computer software is similar to other tools. In particular, they suggested that we should only notice a tool when things go wrong. For example, people do not have to concentrate on using a hammer when putting in a nail. We only notice tools when things BREAK DOWN. For example, we can become accutely aware of a hammer when we hit our thumb. In the same way, users should not be conscious that they are using computer programs. They should be focussing on the objects that interest them: letters; accounts; information. They should not be focussing on the mechanisms that are used to produce them: menus; forms; command lines. It's only when people get into trouble that they have to worry about finding the right menu or form. Such nstances of 'break down' indicate flaws in the system design.

This section looks at ways in which computers are being developed to provide new types of tools. In particular, the development of mobile devices through cellular and satellite telephone systems creates new mechanisms for the storage and retrieval of remote information. The introduction of ubiquitous computers, that is computers which integrate 'seamlessly' into our daily lives, creates opportunities to 'add value' to a wide range of objects. Groupware systems provide tools for coordinating the activities of teams of people; it breaks down the communications barriers that isolate single users operating single machines.

Mobile Computation

Mobile computation can also take place over larger distances using cellular and satellite telephone links. These systems use modems in the same way that many users use a modem to connect up over standard telephone networks. The only problems concern the reliability of the system. Errors can be introduced because analogue, that is non-digital, telephone systems were designed to carry the human voice rather than the binary, digital signals of computer networks. These problems are being reduced by the introduction of digital networks. There are, however, many areas of the world that are not covered by these facilities. Those areas that are covered, use several different systems.

Groupware

Groupware refers to computer programs that are intended to help several different people work together on a common product. There are multi-user conferencing systems, group text-editors even `virtual' universities. Another term that is commonly used for these applications is CSCW systems (Computer Supported Cooperative Work). There are a number of problems that make these applications particularly difficult to design and build. For example, what would happen if two people were editing the same document and one person deleted a paragraph while another was still working on it? Mind blowing... There is a system for the Mac called Timbuctoo that enables two users to each have a mouse and a pointer on the same screen. This creates huge potential for confusion.

Another class of systems supports teleworking. These enable groups of users to remotely log-in to their place of work. They need not be physically present in their office. Teleworking can combine elements of mobile computation if the user is moving around the country as they work. It can also include elements of groupware if they have to cooperate with their colleagues over the network.

Mobile Computation

The Limitations of Traditional Networks.

Traditional networks suffer from a number of limitations:

they are expensive to maintain. A considerable amount of re-cabaling is required every time a new node is added to a network. This work can be reduced by connectors that can simply be snapped into place on an existing cable. Alternative, connector boxes can be used that resemble telephone junction boxes. These offer a fixed number of plugs that users can exploit to hook up their devices. In either case, the physical characteristics of the wiring impose considerable limits on the flexibility of the network.
they can be insecure One way of attacking a computer system is simply to hook another computer onto the cabling and listen to the traffic that is passing from one point to another. this is called eaves-dropping. Radio based communication may be less vulnerable because any eaves-dropper would have to locate the frequency of the transmission that they were interested in. This frequency might then be changed every so often to prevent eavesdroppers from continuing to monitor a channel even if they could find it;
they may not support all user tasks There are some systems that simply cannot use conventional computer networks. Applications ranging from Gas Board vans to military command and control systems cannot simply trail cables everywhere they go. Instead, there must be some way for `on-board' computers to communicate with their central servers;
not good for consumer electronics If companies want to `hook-up' a range of household devices, for example a PC to a heating controller or a video recorder or a Hi-Fi, then few people are willing to tear-up their house to install a local area network between the rooms in their house.

All of these problems have led manufacturers to explore `wire-less' technology for digital communication.

Mobile Computing Infrastructure

Strengths and Weaknesses of Cellular Technology.

The Conference of European Telecommunication Authorities is currently working to 'harmonise' European networks for mobile communications. Similar initiatives have led to a digital mobile communications standard throughout North America. In Japan, there are plans for at least two different digital radio communications networks for mobile computing devices. These initiatives have encouraged hardware and software developers to invest in a vast array of hand-held and lap-top devices. Recent developments in the communications infrastructure, enables the users of these systems to access local and remote resources without being forced to connect to a physical telephone line. These systems are, in turn, posing new challenges for human computer interaction.

Cellular Architectures.

Radio technology offers perhaps the most obvious means of connecting mobile devices. This approach exploits the cellular systems that currently support mobile 'phones. In this system, the area to be covered is divided into a number of cells. Each cell has its own transceiver (transmitter-receiver). If the user moves from one cell to another then their 'calls' are passed between transceivers. The idealised architecture shown on the previous page hides many of the problems that frustrate the development of mobile computer systems. There is a trade-off between the volume of information that a radio signal can carry and the distance that the signal will travel. High frequency signals carry more information but are susceptible to interference and dispersion. Low frequency signals carry less information but will travel over longer distances. Radio-based communication also suffers from: signal fade due to adverse atmospheric conditions; unintentional electromagnetic interference; interference from other devices using the same channel and variable signal strength due to movement of the device. Until such problems are addressed, users will continue to suffer the delays, broken connections and interruptions that frustrate mobile, human-computer interaction.

What Will These Systems Look Like?

There are two approaches. Firstly, future generations of mobile devices may attempt to support a transparent form of human-computer interaction. By this we mean that the underlying telecommunications will be invisible to the user. They will not know whether their system has had to dial up a remote transceiver or not. this will be difficult to achieve because there will be inevitable delays while information is passed from a remote site.

Secondly, future systems might exploit an opaque approach. In contrast, this would force the user to `log into' a remote transceiver before dispatching a message. This would enable the user to choose whether or not to incur a lengthy delay when transferring data between sites.

Mobile Computing Infrastructure

The Strengths and Weaknesses of Satellite Networks.

In order to understand the nature of the problems that frustrate interaction with mobile computer systems, it is important to have some idea of the underlying technology. Satellites offer a number of benefits for mobile interaction. Unlike radio systems, they do not suffer from the problems of multipath transmission. This occurs when signals 'bounce' off objects in the environment. This is a significant problem for interactive systems because mobile devices must then filter out any additional signals to recover the users' information. Unfortunately, satellites must filter and correct for atmospheric interference and for noise in space. There are further limitations.

Geostationary satellites

Geostationary satellites must maintain an orbit of approximately 36,000km in order to hold their position relative to the earth's surface. This incurs a half-second delay on transmissions which, in turn, affects the usability of mobile devices. For example, if an item of information is lost between the transmitter and the receiver then several seconds may go by before the missing item can be detected and corrected.

Low Earth Satellites

Low-earth orbiting satellites avoid this delay but the user's device must then track a satellite's movement across the sky. Both forms of satellite communication currently suffer from a relatively low-bandwidth (8-20 Kbps). This limits the range of tasks that users can perform over these links. A number of international initiatives are currently devising solutions to these problems. For instance, by building a global network of low-orbit satellites.

Groupware

What are they?

The term groupware covers a vast range of computer applications:

conferencing systems; These systems range from simple messaging applications where textual messages may appear below the user's name, to full-blown video conferencing systems. In either case, the intention is to reduce the problems that can arise during telephone conversations. In particular, these systems avoid the irritations that can arise when three or more people try to communicate over the same phone line. In the case of video conferencing systems, there are important visual cues about who wants to talk next etc. At present, the communications infrastructure on the Internet can support relatively low quality video conferencing where delays may interrupt both images and sound. A range of attempts are being made to address these problems. For example, ATM (Asynchronous Transfer Mode) networks provide means of firing multimedia data at high speeds across local and wide are networks. The department has one of the world's leading research groups in this area.
multi-user text editors; These systems enable groups of users to simultaneously edit the same document. This is important, for example, if two different groups have to work on different sections of a joint publication. The alternative would be to send and re-send different drafts between the various sites. Each site would then be unsure about whether they had the most recent copy or whether it was `in the post'. If both groups can access the most recent version that the other group is working on then these problems may be avoided.
CASE tools; These are Computer-Aided Software Engineering systems. They help groups of programmers to develop code. For example, they might provide information about the data types that must be used in two different areas of a program. They may also help teams to work out where their colleagues are currently concentrating within a system.
command and control systems; These applications include computer interfaces to systems on-board aircraft or within power generation systems. Again, the various operators must cooperate to preserve the safety of an application. Confusion would result if one user tried to shut-down a component while another tried ti start it up.
and many many more...;

Groupware

What are the problems?

A range of problems mark out CSCW systems as being different from single user applications:

Synchronous and Asynchronous Systems The first problem is that is may be difficult for users to know exactly who else is using the system. Synchronous means `at the same time'. Asynchronous means `at different times' or independent. You get both types of CSCW systems. For example, a system that supports collaboration between groups in Scotland and Australia need not be synchronous. Time differences mean that there would only be small periods of time when they would both be working on the system, If the application is asynchronous then many of the problems of contention (see below) do not arise.
Contention This occurs when two or more users want to gain access to a resource that cannot be shared. For example, it may not be possible for people to work on exactly the same piece of text at the same time. Alternatively, users might be prevented from simultaneously interacting with a process component. This would stop people attempting to start and stop a piece of equipment.
Interference This arises when one user frustrates another by getting in their way. For example, one person might want to move a piece of text while another attempts to edit it. Similarly, one user might want the others in a group to vote on a decision while another user might want to continue the discussion.
Different views It is unlikely that all of the users of a system will want to look at the same pieces of data at the same time. So for example, one person might be looking at the start of a document while another might be working on the closing paragraphs. In such circumstances, there can be considerable confusion if one user has to tell another about the information on their screen because they may not be able to see it at the same time. Similarly, one user might be looking at one component in a chemical plant while another might be looking elsewhere. It would then be difficult for the first user to keep track of what the second is doing.

Groupware

What are the solutions?

A number of attempts have been made to identify solutions to the problems on previous slides. Many of these solutions have been `borrowed' from other areas of computer science. For example, the problems that arise when two users attempt to edit the same piece of text are very similar to those that arise when two users simultaneously attempt to access the same printer. In this case, one user may be `locked out' until the other is finished:

Locking mechanisms.
These techniques bar access to a shared resource for some period of time. The problem here is to guarantee fairness. This means that all users will be allocated the resource that they require. A related issue is one of `liveness'. This means that any particular user will eventually be granted that resource. In any event, a critical issue is the granularity of the lock. For example, a simple solution is to lock an entire document so that only one user can write to a file at any one time. Alternatively, locks may be placed on paragraphs or even sentences so that only one user may access that particular section of the resource. The smaller the lock, the more flexible the system. However, the smaller the lock, the greater the complexity of managing the system.
Priority protocols.
If you use locks then you must decide what happens if two or more users attempt to acquire the locked resource at the same time. One solution is to associate priorities with particular users. For example, members of staff may be granted access to a printer before all student jobs. this approach reduces the fairness in the system because some users may be excluded from accessing the system if members of staff continually print out their jobs before the students ever acquire the device.
Voting.
A more equitable solution is to ask everyone who should have access to a shared resource. in other words, the team takes a vote before anyone gets a chance on the document, paragraph or sentence. Obviously, this imposes a massive overhead because dozens of requests for votes might be generated on a particular system.
Split and independent views.
Finally, the problems of monitoring other users' activities might be reduced by splitting the screen in various ways. For example, one window might provide a high level view of a processing plant. another might be more focussed upon your own activity. Similarly, what are know as `video portholes' enable users to view cameras in a number of different locations at the same time. By double clicking on the image from a particular camera, the users view may be expanded to see that room in more detail.

Mobile Computing Infrastructure

Exercise: CSCW Design.

Try to design a menu based interface for a multi-user text editor. You might want to consider the following tasks.

Design a viewing mechanism whereby you could see the exact sections that all of the other users were currently working on. One way of doing this would be to represent the document and on that representation show a number of different icons to represent the position of the other users' mice.

Implement a locking system where each user could select a section of text and exclude other users from accessing it. How would a user release the lock that they held on a piece of text? What errors might occur while using the system?

Implement a voting system in which each user could vote against someone being able to access a paragraph, sentence or word. How would the votes of the other users be gathered and how would te results be communicated?

How might the system have to be changed if, instead of a textual document, the group were working on a computer program? Could procedures and functions be represented as different pages of a document? Where would lock be imposed? On procedures and functions? What would happen if one user attempted to edit the definitions for a type that was used in various places in a program?

Security

Why bother?

Security is becoming an increasing concern for computer users and for systems administrators. In all areas of computer use, the threats posed by malicious and criminal activities are increasing. A number of reasons can be identified for this rising threat. These range from the increasing interconnection of the world's computers to the increasing technological sophistication of the general population.

The increasing interconnection of the world's computer networks is a an issue because more and more companies are connecting to the Internet. In the past, they preserved their security by denying all external access to their systems. The increasing use of the web to advertise and sell products has meant, however, that more commercial systems are hooking up to the Internet. The increasing communications opportunities provided by electronic mail has also encouraged greater inter-connection. All of this increases the stakes for malicious and criminal users. Electronic funds transfers and commercially sensitive email messages are tempting targets. For example, an eavesdropper might be concerned to find out any information they can that might affect the share price of a company.

The increasing value of the information being stored and transferred across the world's computer networks is also increasing the importance of security. It is now the case that many companies would simply cease trading if they lost their computer records. For instance, imagine what would happen to a bank that lost all of its account details. In the past, companies would keep manual or paper-based records in addition to their digital systems. Today this is not always the case. In such circumstances, companies and government organisations will employ disaster management companies to provide back-up machines and shadowing systems to create duplicates of any essential data.

The technological sophistication of the general population is increasing. This means that more and more people have the knowledge and ability to `beat the system'. This means that commercial and governmental organisations must continually stay one step ahead of the people who who `hack' or `crack' into their systems. (Note: some programmers call themselves `hackers' because they hack the code into shape. The media often refer to people who attack computer systems as `hackers'. Many programmers find this irritating and would prefer the term `crackers' for people who attack systems. Hmmmmmmmmmm)

Different attitudes to digital data

One of the things that makes security difficult to achieve is that people regard digital data as different from physical information. In some way, because you can't touch and feel it, they perhaps think it is less `important'. This is a very, very dangerous assumption.

If you take or steal someones diary, they immediately become concerned about its loss. They may forget appointments, tutorials, lectures, parties, other peoples phone numbers, their own phone number and so on. If the diary contains information about their credit cards, bank accounts or medical conditions then there may be further concerns. So, generally, most people keep their diaries in a safe place.

In contrast, many people will leave their passwords lying around on scraps of paper. In systems that require passwords to access them, they may walk off and leave themselves still logged onto the system. For many novice users in an educational environment this can have very irritating consequences. Your `friends' may start sending lots of abusive email messages to lecturers so that they look as if they come from the user who left themselves logged on, In other situations, I've seen people create bogus web pages for users who leave themselves logged on. `Select this link to view my social life', and then it takes you to a page about knitting or kangaroos or whatever. Alternatively, other users might use the opportunity to look at your solution to a recent assessment. In a commercial setting, a malicious useer might delete critical files, look at your data and so on.

The real differences between digital and `physical' sources of information, is that it is incredibly easy to copy and forge digital data. It is also easy to search for any information that the user might leave unprotected. These are strong reasons why you must be more careful with digital security than with more conventional systems.

Information is valuable

A number of factors contribute to increasing the value of information beyond its expected level:

once a secret is lost you cant get it back.
Firstly, once someone has had access to sensitive or critical information, it is no longer possible to pretend that this information is not in the public domain. For instance, you cannot assume that the person who broke into your system will not have passed information onto others who may also have an interest in your data. The ease with which people can duplicate electronic information makes this a particular problem.
intruders may not leave any footprints.
If your system does not have an efficient accounting system then you may be completely unaware that someone can access your data. This is especially worrying if they could alter your data in their favour. For example, a bank's records might be gradually changed to show an increasing balance in the intruder's account.
digital data is easily destroyed.
If a malicious person has access to your system then they could easily issue a request that will delete all of the files in a directory. The same command might then go on to delete all files in a sub-directory, a sub-sub-directory, sub-sub-sub-directory and so on. Such a recursive deletion might take many hours if you had to physically rip or burn up physical data.
information often has a strategic importance.
Even if intruders do not make massive changes to your information, a few subtle changes to particular values might have a profound impact upon a company's behaviour. For example, changing the price quoted for an airline ticket so that your competitors are always a few pounds cheaper than you could have a massive impact upon the amount of business that you do.

Who can you trust?

It is generally assumed that most security violations within large organisations come from within that organisation, either through malicious actions or through carelessness. At one extreme this takes the form of industrial and military espionage. For example, a journalist posed as British Telecom employee in order to find out the private telephone numbers of the Prime Minister, the Queen and other members of the royal family. Such information has obvious implications for the prevention of terrorism. At the other extreme, security may be breeched as the result of someone leaving disks and print outs in a public place. For example, in 1992 a naval officer left a portable PC in the boot of his car. His car was stolen and on the PC were details of the West's response to any naval conflict with Russia.

As the greatest security threats come from within an organisation, it follows that many companies have clear rules of disclosure. These specify what can and what cannot be revealed to outside organisations. These rules extend to the sort of access that may be granted to the company's computer systems. A particular concern here is what repair facilities may be provided for information that contains sensitive data. One of the most effective means of breeching security is to act as a repair man/woman and copy the disks of any machine that you are working on.

Finally, security is based around a transitive closure of the people that you trust. This basically means that if you pass information onto someone you trust, then you'd better be sure that you trust all of the people that they trust and so on. If they pass your information onto someone else then you have to trust all of the people that this new person trusts as well...

Inferences from information?

People often only look at the impact of losing the information that they store on a computer to a malicious or criminal act. However, it is equally important to consider the inferences that can be made from such data:

absence of information
. This is critical because people can often make judgements based on the absence of information. If your system contains no records about a particular subject then an `intruder' may conclude that you are completely ignorant about a critical aspect of their operations. In military terms, this equates to finding out that your enemy knows nothing about your plans to attack them.
network monitoring
. It is also important to understand that certain inferences can be made even if the `intruder' cannot accurately see the content of messages passed over a computer network. For example, if two companies work in the same industry and the amount of network traffic increases between their head offices then this may indicate a potential merger or joint venture. Similarly, a burst of traffic might be anticipated immediately before a military operation.
transfer of funds
. Again, if money enters or leaves an account, an `intruder' might make certain inferences about the transaction that led to the payment. An important clue here would be the account details from where the funds originated.

Technological Threats

There are an increasing range of technological tools that malicious and criminal users can deploy against our systems.

Trojan Horses

This form of attack is named after the way that the Greeks attacked Troy by hiding inside their `gift horse'. In computer terms, you hide a malicious piece of code inside a program that appears to offer some other facilities. Typically, a file named really_exciting_game will provide a very boring game but will also contain a program that attempts to access your password file. Once the program has obtained a list of user names and passwords, it may write them to a file that is visible to the attacker. From then on, the gates are open and your system is insecure. An alternative approach, would be for the program to continue to run after you think you have quit it. the intruder might then be able to use the still running program to gain access to your files and resources.

Time Bombs

As mentioned in previous sections, the greatest source of attack is from within your organisation. Time bombs are, typically, left as a means of retaliating when an employee is dismissed. For example, a program might be scheduled to run once every month. The code would check payroll records to see if an employees name is on it. If this is the case then nothing else happens. If, however, the name is no longer there then the program takes some malicious action. For example, it might delete the rest of the payroll or move money into another account. Such programs indicate major breeches in security because they require access to personnel data and they require the ability to take some malicious action. The long term consequence may, however, be less severe than a Trojan horses because the system may be left secure in spite of the damage caused.

Worms

Worms are self-replicating programs. These represent a major threat because they will gradually consume more and more of your resources. Your system will slowly grind to a halt and all useful work will be squeezed out. This useful work will include attempts to halt the growth of the worm. It takes considerable technological sophistication to write a worm. You must first gain a foothold in the target machine. From there you have to create a copy that can be compiled and executed on that host. This copy must then move on to generate further copy and so on.

The most famous example of a worm was created by Robert Morris. Over 6000 computers were eventually infected by the program. The consequences reached such a scale that the US government suspected a deliberate attempt to attack their systems by either a terrorist organisation or a foreign government. The program used several different techniques to gain access to remote sites. The first involved passing a parameter to a mail system that was longer than the mail tool expected. A loophole in the design of the mailtool meant that the remainder of the parameter was executed as a separate command. This contained the instructions necessary to replicate the worm. Other forms of attack attempted to log on as another user by entering a list of common passwords, such as password, my_password, celtic, rangers, thistle etc. If all of these attempts failed then the worm used a standard dictionary to attack people's passwords.

Eavesdropping

Attackers do not need to know the content of a message to make inferences about the data that you are communication. Simply monitoring the amount of traffic between two sites can provide important clues about the messages that pass between them. More communication may be prelude to a merger between two companies, a move to attack a common enemy and so on.

More sophisticated eavesdropping may measure the amount of electromagentic radiation that is emitted from a display. this in turn can be used to reconstruct an image of the information that is presented on the display. The protection that can be provided against such forms of attack involves the introduction of shields into the walls, floors and ceilings of office appartments. Again, the costs of such measures must be balanced against the potential costs of someone accessing your data.

There are numerous examples of less advanced forms of eavesdropping. Perhaps the best known is the alleged attempt by British Airways to obtain information from Virgin Atlantic's computer systems. This was the subject of prolonged litigation where BA were accused of having accessed Virgin's records that were stored on part of a computer that was leased from a BA company.

In retrospect, it seems surprising that Virgin would have considered such a vulnerable set-up. In this case, they were able to make good their losses through the courts. In other situations, they might not have been so fortunate. It would, perhaps, have been better not to have created the opportunity for such as `attack'?

Information Retrieval Systems

In the recent past, many people were lulled into a false sense of security because the mass of data held on any system meant that it would literally take too long for any `intruder' to find anything of value. Unfortunately, a number of recent developments in the field of information retrieval make this a dangerous assumption:

feedback mechanisms
. These techniques give searchers some information about the level of confidence the system has that a particular document will match a query. This can reduce the time taken to search for information because documents can be ignored if they have a predicted relevance below some threshold value. For instance, if a person was interested in any documents about the space shuttle then a document about astronauts might be returned with a 50% relevance factor while a document about aircraft might be returned with only a 30% relevance factor. These weightings are calculated by the system on the basis of previous queries with the system. If the user always looks at the astronaut files when searching for information about the space shuttle then the relevance for those documents will increase.
efficient indexing
. This is the process by which a system can allocate key terms to the documents in a system so that it does not have to search the entire set of data in response to each query. For example, if a document of 1,000 words contained the word `space' more than 50 times but the word `orange' only once then the document may be indexed under space but not orange.
natural language queries
. It is no longer necessary for users to learn complex query languages to use these systems. Although, it might be assumed that anyone who had the expertise to attack a secure system would also have the expertise to learn how to use an information retrieval system.

All of these factors help an attacker to scan a dataset and then extract any relevant documents at their leisure.

Protection Mechanisms

The increasing range of attacks that can be made against computer systems must be countered by a relatively small number of protection mechanisms.

Encryption

Users can protect their data against attack by encrypting it. This involves turning any message into a coded form that the intruder cannot read. There are various different approaches to encryption:

secret key encryption. In this approach, you have an algorithm which hides the message. in order to decode any file, you need a key to extract the original message. For example, if you take the position of any character in the alphabet and add two positions to it you can get a coded message. A -> C, B -> D, C -> E, D -> F and so on. In order to decode this message you need to know the system that was used and you also need to know the key, that is that you must more the characters two places to the left to get the original message. If you moved the characters four places or five places the whole thing would fail.
public key encryption. This is slightly more difficult to explain. Basically, you have two keys. One is used to encrypt the message. anyone can have this as a means of sending you a secret. You also devise an algorithm which means that nobody else can decode the message without another second key that is kept secret. This will be covered in more detail in third and fourth year courses.

Digital Signatures

Let's assume that you have a public key system. Instead of keeping the key that you use to decode a message secret, keep the key that you use to encode the message a secret. this means that you can give everyone the key that you need to decode a message BUT only one person will know how to encode a message so that the decryotion will work. Because only one person can correctly encode the data, this system is like adding a signature to a message. recipients know its from you because only you know how to encode the data.

This technology is absolutely critical. Even if the details seem a bit hard to follow, it is very much worth thinking about because digital signatures will be at the centre of electronic payments over the world's computer systems. Unless you can be sure that someone has personally sent an order for a product then it would be easy for someone to issue a hoax request for a good or service that they had no intention of paying for. Folk history has it that one of the undergraduates in York issued an order for a rocket launcher from the Alabama State Armoury, all in the name of the head of Department,

Please, please, please, don't try this here...

Access control Lists

Ultimately, the security of most systems depends upon associating lists of privileges with either users or system resources. For example, when you log-on you typically give a password. This then sets all of your privileges with the system. If you are a member of the support staff then you will be able to do more things than a member of academic staff. If you are a member of academic staff then you will be able to do more things than a student and so on. This technique is known as a capability based approach.

There are a number of alternative security mechanisms, In particular, it is possible to develop systems around access control lists. These associate lists of users with objects in the system. Only those users on the list can access that resource. For instance, a system might ensure that only the user of a file can write to it but any number of ther users can read it.

If we put these approaches together, we have a system in which users have a list of privileges. These are like keys that the user can present to gain access to a resource. Some systems hide their files so that only those users who know the exact location of a file can access it. The location of the files, or its path, becomes the key. Each resource will also have a list of users who can access it. The user presenting the key must not only have the correct status to access the resource, they must also appear on the resources list. Unsurprisingly this is know as the lock and key method.

Security policies

The problem with most security mechanisms is that they are only as good as the staff who use them. For example, if you give someone your password then they can pass your password on and on and on. The only way to secure the system would be to keep a visual check on people using an account or monitor users by some form of biological scanning. In order to minimise the threat posed by such breeches, most organisations build their systems around a security policy.

A security policy is a set of assumptions that guide the application of security mechanisms. For example, if you wanted to protect certain areas of your system then you could put locks and keys on those areas. Keys would only be granted to users who had the relevant permissions. By regularly changing these keys, the organisation might attempt to reduce the chances of anyone continuing to gain unauthorised entry.

One extension of this policy is know as fire-walling. In this approach, you erect defences around particular areas of your system. Password access and other checks are used to ensure that even if someone does gain access to one area of your system then they cannot access other areas. Many Universities implement this policy by requiring users to have different passwords on their various accounts. Technically, it would be possible for all systems to consult a central password file. This would mean that if someone gained illicit access to one machine they would gain access to all of the user's other accounts on other machines.

Other security policies associate privileges with groups. Typically, in Universities these privileges may differ between academic staff, support staff and students. In commercial organisations, they may differ between departments and user groups. In any event, it is critical to ensure that users cannot acquire privileges beyond those associated with the rest of their group. The right to grant and deny permissions must, therefore, be carefully controlled. If you have permission to grant permissions then you control the system, This is why this permission is associated with the `super user' in Unix systems.

Exercise: Security policies

Make a list of all of the security techniques that are used in the Computer systems that you have access to. How secure do you think they really are?

In many organisations, there is a compromise to be made between security and freedom of access. The more passwords you need, the more protection mechanisms you install, the harder it is to gain access to the resources that you require. Why is it that many organisations opt for maximum access until something bad happens?

Do you think that this University ought to take more care over the security of its resources? Are there any ways that you can think of that people might use to gain access to your laser password or any other protection mechanisms that you might have used in the University?

If you were at all interested in the material on cryptography, then there is a mass of information about existing and future techniques on: http://www.cs.hut.fi/ssh/crypto/ Again, why is there a compromise to be made between access and security in cryptology? Are there any dangers associated with cryptology for society as a whole? For example, what would happen if extreme organisations could use the world's computer networks to coordinate their activities in a way that was secure from outside interference?

Safety

Introduction

The term `luddism' refers to a strong negative reaction against the introduction of technology. It accurately describes many peoples' response to the introduction of informatio n technology. This antipathy towards computer systems need not be an unthinking response. For example, any user who has experienced the intermittant crashes that affect P Cs and workstations is, perhaps, justified in questioning the introduction of in formation technology into car control systems, medical applications and chemical processes. This section addresses these concerns by identifying the threats and protection mechanisms that have been developed to preserve the safety of computer applicati ons.

The Myth of the Cyborg

The myth of the cyborg has had an important impact upon popular attitudes toward s the safety of technology. This myth has been popularised by Hollywood, in films such as Terminator and Bla de Runner. It is, however, part of a longer stream of literature. These earlier sources include writers such as H.G. Wells and Mery Shelley. They were less concerned about the nature of the technology itself than in the ( potentially subservient) relationship between people and the machines that they create. In modern times, the introduction of complex, computer controlled systems has in creased the relevance of this relationship. However, the threat may be less immediate, less spectacular and less obvious tha n that portrayed by Terminator. We are not threatened today by a malign supercomputer seeking to destroy humanit y. However, our everyday `safety' is almost entirely dependant upon information tec hnology. Modern society is almost entirely supported by information technology. If the world's computer systems failed then so would our food and power distribu tion networks. There is a very real sense in which we are all cyborgs. We are all dependant on machines to survive.

Risk

The relative risks posed by information technology can be assessed by statistica l means. Government authorities, such as the National Statistical Office, and regulatory organisations, such as the Health and Safety Executive, maintain careful records of the number of deaths that can be attributed to a range of technological fail ures. The problem here is that it is almost impossible to derive accurate estimates fo r computer-related fatalities from these statistics. This is because in one sense computers cannot easily kill anyone. It is the systems which they help to control that are the immediate 'cause' of a ny accident. For example, Kletz estimates that the risk of fatality as a result of a nuclear release from an atomic power station is 1 in 10^-7 per person year of exposu re (see T. Kletz, The Assessment of Benefits and Risks inRealtion to Human Needs . In R.F. Griffiths (ed) Dealing with Risk, MUP, 1984). The threat comes from the power station itself, not directly from the computers that control it.

What do you think of this argument? On the one hand it is appealing; computers never killed anyone. On the other hand, it is seriously flawed for without some source of error then the risk need not have been incurred. The important point is that, as engineers, we must go beyond the raw risk statis tics and examine the contribution that the computer systems make to the overall safety of the system as a whole. If we want to reduce the risk then we have to find those parts of the system tha t pose the greatest danger of failure.

There are further problems with the previous argument. Kletz presents the risk as 1 in 10^-7. What does this mean? One interpretation would have us believe that a risk of 1 x 10^-7 means that one person will die for every 10,000,000 that are exposed to the hazard (s ee N. Storey, Safety-Critical Computer Systems, Addison Wesley, 1996). In order to validate such a statistic we would have to have clear grounds for as sessing the potential results of exposing 10,000,000 people to the specified ris k. There are unlikely to be many volunteers. In consequence, it is best to be extremely skeptical about the reliability and a ccuracy of such low probability estimates for the risks of technological failure s.

What is Risk?

Risk = frequency x cost. It is important to consider both frequency and cost because failures form a spec trum from the high frequency and low cost failures that form an everyday irritat ion to the low frequency and high cost catastrophes that lead to fatalities.

Frequency

The primary problem in the calculation of risk for computer devices is the deter mination of failure frequency. With most physical systems we can test a device for some length of time and reco rd the total number of failures that occur within that time period. For example, a brand of car tyres might be observed to once in a month of contin uouis testing. This would give a frequency of 12 failures per year and so on. Unfortunately, this approach isn't accepatable for computer programs. There are thousands if not millions of paths that can be taken by even a very si mple program. Consider the following:

	Y_POS_MAX: NATURAL := 10;
	X_POS_MAX: NATURAL := 10;

	for Y in 1..Y_POS_MAX loop
		for X in 1 to X_POS_MAX loop
			SCAN(X, Y);
		end loop;
	end loop;

Here there are 10^{2} different combinations for the parameters of SCAN. In order to test for failure frequencies, we would have to examine each of these calls. This is relatively straightforward but what would happen if the user entered maximum values for X and Y. Now we might have to establish different failure frequencies for each possible set of inputs that the user provides. We are no longer dealing with a simple physical device such as a car tyre...

Cost or Utility

The calculation of the risk of computer failure is further complicated by the pr oblems associated with cost assessment. It has already been argued that cost cannot be determine solely by the immediate consequences of an incorrect instruction in the program itself. It is, rather, the knock-on effects that any bug might have in the subsequent pe rformance of the system as whole. For instance, the cost of a mis-calculation in an arcraft braking system can onl y be estimated in tmers of any consequent crash. This, in turn, raises questions about what unit might be used to represent the c osts of failure. Monetary costs only provide a partial solution; how much does each life cost?

Hardware Failure

The problems of software failure have led many organisations to focus upon hardw are solutions to system safety. It can be easier to reason about hardware behaviour and there are fewer layers b etween the application programmers code and the underlying implementation. These issues will be covered later in the Computing Science syllabus. However, it is important to emphasise that hardware failures do occur...

Transient failures

Hardware failures are difficult to detect because they can be transient. they may occur a periodically and then rectify themselves. For example, alpha particles have been known to introduce inaccuracies into calc ulations when they strike semiconductor memory chips (Storey, 1996). Such problems are difficult to identify because they may leave few traces and no lasting damage to the system itself. They are, however, typical of a latent error. If they go uncorrected then they may occur in a context which could lead to disa ster.

Intermittent failures

Intermittent failures appear, disappear and then reappear at a later time. For example, a fault solder on two wires or a dirty connection between two plugs may occasionally disrupt electrical contacts. They can be just as difficult as transient faults to detect because any testing and diagnosis must coincide with a moment of failure.

Persistent failures

Persistant failures are, typically, the easiest to diagnose because they continu e for a prolonged period of time. this does not mean, however, that they are easy to diagnose. For example, consider a pathological case in which there is an intermittent or t ransient failure in the systems that are used to detect persistent failures...

Hardware Fault Detection/Resolution

There are a wide range of techniques that can be used to mitigate the effects of hardware fsilure, These focus upon the twin problems of first detecting the failure and then resol ving it.

Triple Multiple Redundancy

Triple multiple redundancy involves taking a 'vote' between several different pi eces of hardware. If all of the components agree then the decision of the vote is unanimous. However, if the vote is not unanimous then a fault has occured. The system will then, typically, follow the concensus.

This form of voting raises a number of issues. The first is cost. Each critical element will have to be replicated by N-1 other components, where N is a prime number in order to avoid any split decisions. This increases cost but will also have consequent effects upon both the weight o f any device and the amount of heat that is generated by the circuit. Heat is significant because unless sufficient cooling is provided then this will , in turn, introduce faults into the system over time.

The second issue in voting systems is that of duplicated failure paths. If exactly the sam hardware is used in all of the replicated components then the y may all share a common design flaw. This means that they will all agree on the same 'wrong' decision. In consequence. most practical applications will use different hardware to perfo rm the same function. comparisons are then made between the outputs from these multiple, redundant sys tems.

Signal comparison

In order for techniques, such as triple multiple redundancy to work, it must be possible to compare the output of different components during the voting process . This is difficult if a heterogenous architecture is used; ie the redundant compo nents are not all the same. For example, if different manufacturers or designers supplied the components the n the differences in the devices may mean that the results arrive in for the vot ing process at different times.

Information redundancy

It is possible to check the integrity of a signal by encoding additional pieces of information into a signal. For example, imagine we wanted to send the following three numbers over a networ k: 10, 22, 9. The recipient would have no means of checking whether they were correct or wheth er they had been changed through a hardware/software fault. We could reduce this uncertainty by adding add the numbers together and sending the sum as a fourth number: 10, 22, 9, 41. If the previous three numbers did not add up to 41 then either the numbers were corrupted, or the checksum was corrupted or there was a fault in the program use d to check the calculation.

Watchdog timers

Watchdogs simply decrement a counter after a preset interval of time. This continues until the counter reaches zero. If this occurs then the watchdog issue instructions to reset the monitored appli cation.

The monitored application, in contrast, will continue to increment the counter so that it is above zero. Only if the application crashes, will the counter start to fall towards zero.

Bus Monitoring

This is a more sophisticated means of monitoring for hardware/software failure. Buses are used to transfer data between key components of the computer system. For example, a bus typically runs between the CPU (Central processing Unit) of t he computer and its RAM (Random Access Memory). The CPU actually performs the calculations and the RAM acts as a temporary stora ge area for the results. Bus monitoring checks that any information being accessed in the computer's RAm is in an allowable area (permissible address). If not then the CPU is informed of an error.

Software Failure

Previous sections on risk have argued that software faults are posing new challe nges to our engineering practices. The sheer scope and complexity of many software applications can prevent people from spotting all sources of failure. However, there are some critical stages in software development wherre `bugs' ma y be introduced.

Requirements Flaws

Faults may be introduced during requirements capture. This is the stage in softeware development where engineers must identify the con straints that a program must satisfy. For example, it might be specified that a program should compute a result within six seconds. If such a requirement was omitted or forgotten during the initial stages of deve lopment then the resulting software is unlikely to work correctly. Also, it is important to note that errors that are made during the initial stage s of development become increasingly expensive/difficult to correct as time goes by. Any errors found after software has 'gone live' may be too costly to fix.

Design Flaws

Requirements elicitation establishes the constraints that software must satisfy. Design goes part of the way to describing how those constraints are satisfied. For example, programmers might decide to implement the cartesian coordinates of an aircraft using a pair of natural numbers. The first of these might represent latitude, the second might represent longitud e. Notice how these design decisions do not include details of the code that will b e used to implement the system; this forms part of the implementation/ Again, errors might creep in at this stage. For example, our design would allow aircraft to be positioned at a latitude of 1 , 000,000 which cannot be plotted on any map of the earth.

Implementation Flaws

Implementation flaws are the well-known bugs that everyone talks about. For example, the results of the following calculation would be undefined:

	RESULT, DIVIDEND : NATURAL:= 1;

	DIVISOR : NATURAL:= 0;
	RESULT:= DIVIDEND/DIVISOR;

What does it mean to divide any number by 0? It might seem a trivial task to detect such problems but remeber that complex so ftware may be written over several years by several different teams of programme rs.

Testing Flaws

Testing is intended to identify the flaws that weaken softare. There are a range of reasons why testing often fails. These include the problems of simulating the eventual operating environment of a software program. Cases that may appear to be extreme and hardly worth testing during develoment c an become commonplace in the final installation.

Why are Requirements Difficult?

There are two different types of requirements. Functional requirements relate narrowly to the functions that the system must pe rform. They include whether the program will include software to compute particular res ults or offer particular services. Non-functional requirements refer to more qualitative issues. They include issues such as the provision of a `usable' user interface.

It is, arguably, easier to gather functional rather than non-functional requirem ents. Customers and client sare often focussed upon the functions that the system must provide. This is similar to many people's experience of buying a new HI-FI or video. They arrive looking for Dolby-C with multi-channel seven day pre-programming. Unfortunately, the ease with which these functions can be used is often determin e by the non-functional requirements that are easily lost amongst the frenzy to acquire new features.

Requirements are difficult to gather because they often conflict. The person who is paying for a piece of software is seldom the person who will u se it the most. This create conflict because users might prefer a high quality graphical user in terface that line management are reluctant to pay for,

Requirements are difficult because people are often vague about what they want. they only realise what they don't want when you put your first system in front o f them.

Requirements are hard because they change over time as working practices evolve and as new requirements are discovered from producing early versions that fail.

Why is Testing Difficult?

Testing is difficult because it is difficult to know what to test with many comp uter programs. These systems are so complex that no single programmer can possibly understand a ll aspects of the system. It, therefore, only makes sense to test specific components. However, faults may be introduced because components may not work well together. At some point, therefore, commercial programmers will be forced to test the inte gration of their software into other people's applications that they only vaguel y understand.

As mentioned, it is not possible to test all possible paths of execution in comp uter programs that manipulate potentially infinite number representations. For example, how would one set about to test whether a program worked for all na tural numbers? Test it with 1 then 2 then 3 then 4 and so on up to infinity (or the maximum rep resentable number on the system)? Given that it is not possible to test everything there are essentially two souti ons that have similar aims.

Solution 1: formal verification.
The UK is a world leader in what have b ecome known as `formal methods'. These are mathematical techniques that can be used to prove that a program is co rrect through a process of logical deduction. Rather than testing that a program works for all natural numbers, it is possible to prove that it works for a base case, such as 1, and then prove that it works for any natural number plus one. This process of argumentation is costly, slow and painstaking but has a proven v alue in aerospace and nuclear applications.

Solution 2: structured testing.
This appraoch does not attempt to test all possible execution paths. It does, however, focus testing upon critical areas of a program. These areas are either identified as being important during the requirements sta ge or they are identified through the programmer's skill and expertise. For example, it is always good to test what happends when variables reach their upper and lower limits. In any event, it is CRITICAL that programmers document the rationale behin d any test that they perform. Other programmers must be able to read and understand the objectives for any tes t.

Human Failure

To err is human, but to make a really serious mistake requires a computer. It is wrong to think that computers introduce entirely new forms of error. All types of human error that we see with computer systems have parallels in our daily lives. However, the rapid execution of user's commands and the difficult of restoring t he state of complex application processes can increase the consequences of these more familliar error types.

Incorrect plans

Incorrect plans describe situations in which operators adopts unsafe working pra ctices. These can arise either through lack of training, poor management or deliberate n egligence. An example might be the decision to withdraw the control rods further than was n ormally allowed during the Chernobyl power test.

Slips

Slips are observable errors. They include the left-right confusion that led the crew to close down their one working engine during the Kegworth crash.

Lapses

Intrusions

Intrusions occur when operators execute a sequence of steps from one plan in the middle of another, potentially inappropriate, sequence of instructions. An example might include breaking when suffering a puncture, rather than allowin g the car to gradually slow down. The breaking procedure would be an appropriate response to other highway problem s such as debris on the carriageway,

This is not an exhaustive list. Cognitive psychologists and human factors experts are actively extending this ta xonomy to provide more complete models of human error types.

User vs Designer vs Manager?

What is the true cause of human error? In the aftermath of many major accidents, it is typical to hear reports of `oper ator error' as the primary cause of the failure. This term has little meaning unless it is supported by a careful analysis of the accident. For example, if the operator is forced to manage as best they can with a bug rid den unreliable system then is an accident their fault or thatof the person who i mplemented the program? In turn, if the bugs are the product of poorly defined requirements or cost cutt ing during testing then are these failures the fault of the programmer or the de signer?

Often, what appears to be operator error is the result of management failures. Even if the systems that they operate are well designed and implemented, then ac cidents can be caused because operators are poorly trained to use them. This raises practical problems because often operators are ill-equipped to respo nd to the low frequency, high-cost errors that were emtnioned in the section on risk. How then can companies predict these rare events that users should be trained to cope with?

Further sources of error come from poor working environments. Again, a system may work well in a development environment but the noise, heat, vibration, altitude of a users daily life may make the system 'unfit' for purpose.

Human Factors Solutions

There are no simple means of improving the operational safety of computer system s. Short term improvements in operator training will not address the fundamental pr oblems created by laps and slips. Reason, therefore, argues that errors are LATENT within each one of us and, ther efore, we should never hope to engineer out human error (see J. Reason, Human Er ror, CUP, 1990). This pessimistic analysis seems to be confirmed by experience. Even organisations, such as NASA, with an exemplary training and recruitment sys tem have suffered from the effects of human error.

There are, however, some obvious steps that can be taken to reduce boith the fre quency and the cost of human error. In terms of cost, it is possible to engineer decision support systems that provi de users with guidance and help during the performance of critical operations. These systems may even implement `cool off' periods where users' commands will n ot be effective until they have reviewed the criteria for a decision. These systems engineering solutions actively impose interlocks to control and li mit the scope of human intervention. the consequences are obvious when such locks are placed upon inappropriate areas of a system.

It is also possible to improve working practices. Most organisations see this is part of an on-going training program. In safety-critical applicatiuons there may be continuous and on the job competan ce monitoring as well as formal examinations.

Finally, it is possible to improve the design of human computer interfaces.

Exercise: Human error vs System Safety?

The aim of this assessed exercise is to give you practice at using the world's computer resources to research and present a given problem. You should write an essay of no more than 2,000 words to answer the following question: Human error is a greater cause of major accidents than computer or systems failure. Discuss. An overview of human error and major accidents is provided at Glasgow Accident Analysis Group.