On the Use of Pseudo-3D Images in Human Computer Interaction

Chris Johnson,

Glasgow Interactive Systems Group (GIST),
Department of Computing Science,
University of Glasgow,
Email: johnson@dcs.gla.ac.uk


For the last twenty years, human-computer interfaces have been dominated by two-dimensional interaction techniques. Things are changing. Techniques that were previously restricted to specialised CAD/CAM tools and immersive VR systems are now being extended to the mass market. The photo-realistic facilities offered by QuicktimeVR and the model based renderings of VRML (Virtual Reality Mark-up Language) provide sophisticated tools for interface design. As a result, three dimensional visualisation techniques are being widely exploited in the financial services industry, airports and even off-shore oil production. Unfortunately, research in human- computer interaction lags well behind commercial practice. There are few guidelines that can be applied to support the development of these 3D interfaces. In consequence, users often report intense frustration as they navigate around virtual information spaces. This paper, therefore, describes a number of evaluations that have been conducted to examine the usability problems that affect these interfaces. It is concluded that the standard measures of task performance and subjective satisfaction cannot easily be applied to assess the utility of 3D systems. Finally, Gibson's work on direct perception is used to explain why people find it difficult to identify the underlying usability problems that affect this new generation of human-computer interfaces.

Keywords: direct perception, three dimensional imaging, VRML, QuicktimeVR.


HCI is a lazy subject. Too often it stumbles upon new topics well after industry has moved on. For example, much of HCI was still focusing on text editors and window systems when commercial developers were producing multimedia authoring tools. This paper argues that HCI research is paying too little attention to recent developments for 3D interaction. Some relevant work has been done. For example, Tweedie (1995) has developed techniques that support the design and evaluation of visualisations in general. Benford, Bullock, Cook, Harvey, Ingram and Lee (1993) have provided an initial taxonomy of virtual information spaces. Van Teylingen, Ribarsky and van der Mast (1995) have developed tools that support the 3D presentation of complex data sets. Much of this work has, however, focused upon the model-based presentation of information using immersive techniques. That is to say, they exploit virtual reality helmets and 3D input devices. Unfortunately, interface designers cannot easily apply this previous work to support desktop virtual reality, using techniques such as VRML. Nor has there been much work into the photo-realistic 3D presentation techniques supported by systems such as QuicktimeVR.

In contrast, this paper explicitly focuses upon the new generations of desk-top, photo-realistic 3D presentation techniques. Over the last two years we have built a large number of VRML and QuicktimeVR applications ranging from Museum information systems to training tools for regional Fire Brigades. We have also conducted a wide range of evaluations to assess the utility and usability of these applications. Our results have shown that conventional usability measurements in terms of task performance and subjective satisfaction provide little insight into the qualitative differences that users observe between convention 2D presentations and the emerging 3D technology. Later sections of this paper using Gibson's work on the direct perception of pictures to explain these findings.

1.1 QuicktimeVR and VRML

The last three years have seen the development of a range of tools that enable interface designers to go beyond the flatlands of a conventional desktop. It is possible to distinguish at least two radically different approaches. The first has much in common with traditional 3D modelling techniques. The second offers photorealistic presentations of real-world objects.

The Virtual Reality Mark-up Language (VRML) is a platform independent language for composing 3D models from cones, spheres and cubes. These primitives are combined to create more complex scenes such as those shown in Figure 1. One of the reasons for the rapid rise in the popularity of this medium is that it provides interface designers with a means of delivering 3D interfaces over the web (Johnson, 1997, 1997b). With the advent of VRML 2.0 it is possible to generate and animate scenes that contain links to a wide variety of other information sources including videos, databases and other web pages.

Figure 1: VRML Architectural Model of the Hunterian Museum, Glasgow.(Image source.)

In contrast to VRML, QuicktimeVR offers interface designers with the means of rapidly generating three dimensional resources without the costs of model building. Rather than painstakingly transforming and translating primitive objects, this approach works by shooting a large number of photographs. A viewer is then used to 'stitch' the images together so that users can pan around or zoom into a 3D scene. The photographs are taken using a motorised tripod and a digital camera so that large buildings and complex objects can be recorded in a relatively short period of time. This offers strong commercial advantages over the VRML approach which requires considerable time and skill in order to construct relatively simple worlds. Figure 2 shows an excerpt from a QuicktimeVR tour of the Macintosh House in Glasgow. As with VRML, the visualisation facilities supported by QuicktimeVR have recently been extended to support more complex forms of interaction through the introduction of embedded links to other resources.

Figure 2: QuicktimeVR model of the Macintosh House, Glasgow. (Image source).

Having introduced the underlying technology that supports desktop VR, the remainder of this paper goes on to examine the usability problems that characterise interaction with these applications. In particular, it is argued that 'conventional' evaluation techniques fail to capture some of the critical features that characterise interaction with this new generation of user interfaces.


It is difficult to identify criteria that can be applied during the evaluation of desktop, virtual reality systems. Some requirements such as 'task fit' can be extended from the general literature. Other criteria are less straightforward and relate to the navigation problems of using 'conventional' input devices to traverse three dimensional space. This section, therefore, proposes three requirements that must be satisfied by desktop, virtual reality systems. These criteria are heuristic and non-exhaustive. They are rules of thumb and other additional criteria can also be identified. It is important to emphasise, however, that unless we begin to examine such requirements then designers will continue to lack constructive criteria for the development of future interfaces.

2.1 Task Fit

The images in Figure 2 help people to visualise the layout and contents of the Macintosh House. The three dimensional navigation facilities offered by QuicktimeVR support this visualisation process. Users can direct their browsing in a manner that reflects the physical layout of the building itself. They are not forced to follow the linear structures of printed text. Nor are they forced to follow the often arbitrary structures of hypertext links in conventional HTML pages. It is important to emphasise, however, that the visualisation benefits of desktop VR may not be significant for all tasks. For example, the images in Figure 2 would be of little benefit to a user trying to find the date when Macintosh's Rose Boudoir was first completed. On the other hand, the QuicktimeVR images would be highly important for users seeking an introduction to Macintosh's work. The following except provides a further illustration of this point by describing part of the room shown in Figure 2.

"The furniture was a skilful mix of dark-stained items, mainly of the late 1890s, and white- painted pieces of the early 1900s. Certain items are those exhibited in Mackintosh's lifetime to great critical acclaim on the Continent. For example the stencilled chairs and oval table by the long south window in the drawing room formed part of the Mackintoshes' celebrated room setting 'The Rose Boudoir' exhibited in Turin in 1902." (See http://www.gla.ac.uk/Museum/MacHouse)

This excerpt illustrates the benefits of text. Natural language provides a flexible, ambiguous and rich medium within which to convey information to the user. For example, the use of the phrase 'certain items' enables the writer to refer to a set of objects without explicitly identifying those objects. Such ambiguity would be difficult, if not impossible, to represent within a 3D visualisation. From this it follows that some media are better suited to particular communications tasks than others. A further criteria for desktop VR is, therefore, that: there must be a clear fit between the medium being used and the information being conveyed.

2.2 Open Access and Manipulation

A critical difference between new generations of 3D interfaces and more conventional interaction techniques is that many users have little experience with using two dimensional input devices to navigate three dimensional information spaces . Figure 3 illustrates this point by providing a screen shots of the Virtus VRML player. Here the user must select perspective constraints, including standard, wide angle and telephoto options. They must also select the speed of traversal and the speed with which they will travel through a scene. These are useful features for more advanced users but they are not common aspects of 'conventional' user interfaces. They can, however, have a profound impact upon a novice user's ability to successfully navigate three dimensional worlds of information. It, therefore, seems appropriate to require that 3D interfaces enable users to quickly traverse and manipulate the scenes and objects that the new technology provides.

Figure 3: The Virtus VRML Viewer (Image Source)

Hint: use your browser to open this image for a closer look.

2.3 Subjective Satisfaction

The final criterion relates to the subjective experience of using desktop VR. This is a significant consideration. Commercial motivations for adopting VRML or QuicktimeVR often include their impact upon user motivation. It is less easy, however, to identify ways in which designers might actively sustain the initial appeal of 3D interfaces. Such an understanding is important if motivation is to continue beyond a superficial level. Arguably the best analysis of motivational factors in human computer interfaces comes from Lisa Neal's (1990) pioneering study into the implications of game playing. She concluded that 'Games and other forms of software provide incremental learning situations, in which learning certain rules or elements is necessary for successful use. Learning a game is accomplished through the use of prior knowledge and analogies and through exploration". There are strong parallels between the use of analogy in computer games and the virtual or photorealistic worlds of desktop VR. Similarly, both games and 3D interfaces rely upon users learning be exploration. However, it remains an open question whether or not it will be possible to sustain long term use of both games and 3D interfaces. A final criteria is that, at the very least, users should exhibit strong subjective satisfaction ratings in support of the application of desktop VR systems.

This section has identified three criteria against which designers might assess the utility of three dimensional human computer interfaces: there must be a clear contribution from the 3D model or visualisation to the user's task or information need; the interface's browsing facilities must enable users to quickly traverse and manipulate the scenes and objects that the new technology provides; users should exhibit strong subjective satisfaction ratings in support of the application of desktop VR systems. It is important to emphasise that this is not an exhaustive list. Many other criteria can also be applied. It is clear, however, that we must establish some starting point for research into the usability of these interfaces. The following section, therefore, describe initial attempts to apply these criteria during the evaluation of three dimensional interfaces.


This section argues that a number of methodological and conceptual problems before our criteria can be applied to assess the utility and usability of 3D interfaces. In particular, 'conventional' measures of task performance and subjective satisfaction cannot easily explain users' inability to accurately describe their subjective impression of using three dimensional interaction techniques.

3.1 The Problems of Task Fit

The previous section argued that desktop VR systems must provide a clear contribution to users' tasks. At first sight, this is an obvious criterion. It is, however, less easy to demonstrate that 3D interfaces actually do make a clear contribution to particular tasks. For example, the VRML gallery shown in Figure 1 was designed to house a collection of multimedia exhibits for the Hunterian Museum in Glasgow. Physical, or rather virtual, locations were to provide information about the relationships between objects. Roman tools would be held in one area of the gallery, information about weapons would be held in another and so on. As the user moved through the gallery, then these collections would also change to reflect different time periods. This strategy reflects the careful planning that goes into the layout of 'real-world' museums and was conducted in consultation with the curators and users of the Hunterian. Within each section, the VRML gallery would contain QuicktimeVR movies as well as video clips and more 'conventional' presentations. Figure 4 presents an excerpt from one of these videos. The development of this 3D interface raises a number of questions in terms of the previous criteria. Firstly, it is extremely difficult to adequately capture the broad range of tasks that the users expected to accomplish. Visitors to a prototype set of web pages ranged from schoolchildren to curators, from historians and archaeologists to casual browsers, from computing scientists to graphic designers. Each group had their own 'agenda', each had their own set of distinct tasks. In consequence, each had a different sets of criteria with which to measure the success or failure of those tasks. This task diversity typifies many interactive 3D systems which are open to for public access.

Figure 4 : Excerpts from the Hunterian's Video on Roman Armour (Image source).

Further problems prevent designers from assessing the task contribution of desktop VR. It can be difficult, if not impossible, to evaluate the benefits that VRML and QuicktimeVR provide even for a single group of users. For instance, the schoolchildren who accessed our site were set some predetermined questions by their teacher. A contribution to this task could be measured in terms of the mean class score. However, this would not capture any long term learning effects that are a principle motivation for the application of Information Technology in the class room. Similarly, such a 'shallow' evaluation would fail to capture any motivational effects. We have already argued that 'fun' is an important, and even necessary, justification for the application of desktop VR.

Task and user diversity, as well as the importance of motivational factors, make it important that designers take a very broad perspective upon the contribution made by desktop VR systems. This diversity also makes it critical that designers challenge previous assumptions about the support which particular media provide for particular tasks. For example, the VRML gallery was evaluated against a series of text-only web pages. These provided links to digitised video clips and QuicktimeVR movies. It was hypothesised that the text based interface would support users with directed information retrieval tasks. Users did not have to navigate around a three dimensional VRML model. Figure 5 presents the results of an evaluation involving two classes of primary level schoolchildren. Their attitudes to retrieval delays were assessed ; this measure was chosen because preliminary studies indicated that this was a critical factor in the successful adoption of the web based systems in local schools. As can be seen, the children had broadly the same reaction to the interface whether or not they were 'directed' by comprehension questions set by their teacher (Johnson, 1997a).

(a) Browsing User

(b) Task User

Figure 5: Attitudes to retrieval delays for task-directed users and casual browsers. (a) Browsing User (b) Task User

The findings of our studies with schools came as a considerable surprise. We were unable to establish the hypothesis that task directed users would prefer a text based interface to the overheads of navigating through a VRML model (Johnson, 1997a). However, the more general point here is that current measures of 'task fit' cannot easily be applied to assess the suitability of a particular media for a particular task, far less to make comparisons between different media over a range of tasks. This has serious implications for design. Unless HCI research begins to provide more guidance about the suitability of particular media then designers will be forced to rely upon ad hoc decisions and simple guess work. Unless HCI research begins to provide more guidance about the suitability of particular media then users will be faced with gratuitous QuicktimeVR and VRML that lacks any consideration for their everyday tasks and information needs.

3.2 Why Navigational Criteria may be Misleading

The second of our criteria was that desktop VR systems must enable users to quickly traverse and manipulate the scenes and objects that the new technology provides. Fortunately, this criteria seems slightly easier to evaluate than task fit between different media It addresses surface level interaction rather than task performance in the context of more complex user activities. For example, ease of navigation might be assessed by timings for users moving between different areas of a VRML model or different scenes in a QuicktimeVR clip. Similarly, designers might monitor the differences between actual user performance and an optimal set of actions when manipulating three dimensional objects. However, these quantitative measures provide little indication of how users actually perceive their navigation within virtual environments. We have already argued that user directed exploration is a key strength of desktop VR systems. Such exploration directly mitigates any attempts to obtain optimal timings for user navigation.

Given the problems mentioned above, it seems more productive to explore a more qualitative approach. Designers can compare different user attitudes to the navigation facilities provided by a number of desktop VR systems. For example, Figure 6 illustrates the interface to an early predecessor of the VRML gallery shown in Figure 1. This interface relies upon image maps to download a number of still images as the user navigates through photographs of the Hunterian's main exhibition area. The image on the left of the screen provides an overview of the user's position in from of the exhibit case that is shown in the right hand image. Users navigate by selecting the arrow icons at the bottom of the page. Clearly, as designers, we wanted to establish whether the navigational facilities provided by VRML were a significant improvement upon those offered by our previous implementation. In VRML, users navigate by selecting a mode of navigation, such as walking or flying, and then press the mouse to move 'into' the scene. The intention in this style of interaction is to avoid the additional indirection of navigating through explicit arrow keys. Although, as can be seen in Figure 3, some VRML viewers also offer this navigation facility in addition to mouse based interaction.

Figure 6: Two Dimensional Navigation Using Pictures and Links (Image source).

Hint: use your browser to open this image for a closer look.

Figure 7 presents the results from a comparison between the navigational facilities provided by the interface in Figure 6 and the VRML gallery shown in Figure 1. In contrast to the previous evaluations that focused upon school children, these tests involved a broad cross section of users from the different categories mentioned in the previous section. This was important because we hypothesised that skilled computer users would be better prepared to face the challenges that are posed by 'conventional' input devices when navigating in three dimensions. Initial results have confirmed this hypothesis but, like so many of issues in desktop VR, we urgently need further work over a broader spectrum of users and a wider range of interfaces.

Figure 7: Further Comparisons between Conventional HTML and VRML Access Techniques

Perhaps the most striking feature of Figure 7 is the similarity between the two sets of results. Users reported few difficulties in navigating both the VRML model and the image-mapped interface. Again, this came as a considerable surprise. Many aspects of the VRML browsers are counter-intuitive. Our users had little or no experience of navigating through complex three dimensional spaces using keyboards and mice. They only received a minimal training session before interacting with the Gallery. Perhaps more surprising still is that 'think aloud' evaluations revealed that many of our users felt considerable frustration about the navigation facilities that both systems offered even though they stated that it was 'straightforward' to move around the Museum. As with task fit, our findings suggest that our previous experience with two dimensional interfaces leaves us spectacularly ill-equipped to understand what is going on during interaction with three dimensional systems. Our intuitions often prove false and users often contradict our initial understanding of complex interactions.

3.3 The Problems of Evaluating Subjective Satisfaction

Previous sections have argued that subjective satisfaction and 'fun' are perhaps more important for the academic and commercial users of desktop VR than task performance. Many of the commercial and industrial contracts that have driven this work were originally offered because users abandoned the standard text and pictures of more 'conventional' CAL tools. This criteria is similar to that of 'task fit'. True measures of satisfaction, or performance, can only really be assessed through longitudinal studies. The earliest of our VRML and QuicktimeVR systems has only been 'live' for eight months. It continues to be used by a wide range of visitors (see http://www.gla.ac.uk/Museum/MacHouse) but it is still too early to see whether teachers in schools, colleges and Universities will return to the site in future years. Having raised these caveats, it is still important that designers have some means of assessing the subjective satisfaction that desktop VR provides for its users. For example, Figure 8 provides shots from a Quicktime VR movies of a Greek head from the Hunterian Museum. The second and third images show what happens if the user uses their mouse to slowly rotate the head forward. It is possible to hypothesise that such a resource will help users to better visualise the sculptors work than would have been possible using a static two dimensional image. However, our previous studies have shown that such assumptions may unwarranted in the context of 3D interfaces.

Figure 8: A QuicktimeVR Object Being Rotated Forward to View Upper Surface (Image source.)

A series of web pages were developed to provide users with access to models such as that shown in Figure 8. The results shown in Figure 9 were obtained by showing the resulting pages to people in the Museum itself. At first, we were worried that the Hawthorne effect might have biased our results. Visitors might have been too enthusiastic because we were clearly demonstrating 'new technology'. However, it is interesting to note that even under these conditions there were users who were prepared to state that the QuicktimeVR exhibits added nothing to the web pages, The graph on the right presents the reasons given by those who said that the QuicktimeVR exhibitions did add something to the Museum's web pages. Yet again, these initial findings are challenging our assumptions. The motivational impetus of keeping attention for longer is less important that providing a greater 'feel for the object'. Of course, these findings are not statistically significant. Nor could they be when we deliberately chose to monitor 'real' users interacting on the Museum floor. They are, however, indicative of the pressing need to perform more research in this area. Our experience of building desktop VR systems is unlike any other in the field of HCI, previous biases and assumptions about user performance and behaviour are being continually challenged.

Figure 9: Subjective Evaluations of QuicktimeVR Objects


The most interesting of our observations with desktop VR is that users often find it difficult, or are unable, to verbalise their feelings about the new technology. For example, Figure 9 shows a small but noticeable number of users failing to express an opinion about the value of QuicktimeVR. Similar comments could be made about VRML interfaces. For example, the following excerpt is taken from the end of a think aloud session after the user had interacted with four different VRML worlds ranging from an architectural model to the Hindenburg air ship:
Evaluator:	Do you prefer pictures or the models?  
User: 	The models are great.  
Evaluator: 	Why?  
User: 	Um, it just feels different.  
There is a noticeable difficulty in actually explaining what does or does not contribute to a successful three dimensional interface. This has profound implications for designers. If users cannot explain exactly what they do or don't like about desktop VR then it will be extremely difficult to exploit iterative design techniques. User feedback is effectively blocked by the problems that people face when trying to verbalise their experiences in virtual worlds. This observation should not be surprising. For example, a number of authors have argued that there are visual properties that cannot adequately be verbalised:

"(My) theory accounts for the differences between verbal and visual thinking. Visual thinking is freer and less stereotyped than verbal thinking; there is no vocabulary of picturing as there is of saying. As every artist knows, there are thoughts that can be visualised without being verbalised''. (Gibson , 1971).

This citation is interesting because it helps to establish a link between our observations during the development of 3D interfaces and Gibson's theories of direct perception. These links can be extended. For example, Gibson argued that textual representations provide a very indirect means of describing the real world. Pictures provided more direct representations. It can be argued by extension that desktop VR systems increase the directness even further. We have argued from Neal's work that analogy is a vital element in the success of QuicktimeVR and VRML. This in turn provides a further explanation of why users find it so difficult to describe their interaction with many of our systems:

"Not only do we perceive in terms of visual information, we can also think in those terms. Making and looking at pictures helps us to fix these terms. We can also think in terms of verbal information, as is obvious, and words enable us to fix, classify and consolidate our ideas. But the difference is that visual thinking is freer and less stereotyped than verbal thinking: there is no vocabulary of picturing as there is of saying.'' (Gibson, 1971).

It is important to emphasise that we are not attempting to use Gibson's work on direct perception as a framework for understanding every aspect of interaction in three dimensions. In particular, his theories have little to say about the consequences of interacting with a scene rather than simply observing it. However, it is certainly true that software engineers and interface developers urgently require more guidance about how to effectively design interfaces using desktop VR technology.

The lack of work in this area is surprising. There is a huge body of Psychological literature on the perception of movement, on pattern recognition in complex scenes, even on interaction with objects in three dimensional space (Gordon, 1990). The problem is that hardly any of these findings have been applied to desktop VR. This omission must be rectified if we are to improve the usability of QuicktimeVR and VRML interfaces. Research in the field of HCI has fallen well behind commercial practice. For instance, a recent Internet search revealed 51,000 VRML models on the web. It is regrettable that few of them had any HCI input.


This paper has argued that existing HCI guidelines provide designers with little or no support for the development of desktop VR systems. As a result, many VRML and QuicktimeVR applications have a strong initial appeal but provide few long term benefits to their users (Johnson, 1997a). This problem is partly due to the gratuitous application of 3D technology. Users' often have little to gain from performing their tasks in three dimensional space. There is, however, a large class of tasks that would benefit from desktop VR. For example, members of the Glasgow Accident Analysis Group have recently been involved in developing VRML and QuicktimeVR models of major fire hazards. Fire crews can exploit these models to navigate between exit points and hydrants. This is of great benefit as they make their way to the scene of a fire. In the absence of HCI research in this area, interface development has largely been driven by a costly process of trial and error.

We have tried to address the problems mentioned above by developing criteria that might be used to assess the utility of 3D interfaces: there must be a clear contribution from the 3D model or visualisation to the user's task; the interface's browsing facilities must enable users to quickly traverse and manipulate the scenes and objects that the new technology provides; users should exhibit strong subjective satisfaction ratings in support of the application of desktop VR systems. Unfortunately, our attempts to apply these criteria during the development of VRML and QuicktimeVR resources has been less than successful. We have found great difficulties in clearly defining the intended user population for many of these applications. Indeed, our clients often cite the accessibility of desktop VR as a key motivation for their use of the technology. It must support many different people doing many different tasks. We also found great difficulty in obtaining clear results about the navigation problems that users experience when using 'conventional' keyboards and mice to navigate three dimensional space. Users reported few problems in traversing VRML models. This came as a great surprise; observational studies have shown that people often have considerable difficulty in orienting themselves using existing environments. Finally, our attempts to assess the users' subjective experience with desktop VR raised a number of fundamental questions about the perceptual and cognitive processes that characterise interaction in three dimensions. People find it extremely difficult to describe the features that contribute to successful and unsuccessful interfaces to desktop VR applications. Fortunately, there is a considerable body of results from the Psychological literature that might be applied here (Gordon, 1990). Unfortunately, none of it considers the particular characteristics of 3D interaction with desktop VR. Until this omission is rectified, HCI will continue to lag behind the commercial application of this technology.

As a closing remark, the think aloud sessions, mentioned above, not only illustrates the general problems that people face when trying to explain the difficulties they feel when interacting with desktop VR. They also illustrate the deep sense of frustration that can arise when things go wrong in these systems. This frustration helps to mitigate the sense of enjoyment that is a strong motivation for the introduction of 3D interfaces:

Evaluator: 	How are you trying to find the exhibit?  
User: 		I'm looking for the exhibit from the picture and trying to move 
		towards the area that seems most relevant...this area bears no relation 
		to where I wanted to go....
It is a sad fact that the user could have been talking about almost any VRML or QuicktimeVR application.


Thanks go to members of the Glasgow Accident Analysis Group and the Glasgow Interactive Systems Group. Thanks are also due to my students who drove the implementation and evaluation work that is cited in this paper. In particular, I would like to thank James Birrell; Karen Howie; Anthony McGill; James Macphie; Bryan Mathers; Pete Snowden and Mike Waters. I am very grateful to the staff of the Hunterian Museum and Art Gallery, Glasgow for providing technical advice during the development of the VRML and QuicktimeVR resources that are mentioned in this paper. Finally, thanks are due to Steve Draper and Paddy O'Donell of the Psychology Department, University of Glasgow for their advice about Gibson's theories of perception.