Evaluation exercise - Dublin Fusion 2 Web-based system

The following is a brief report on how the exercise was carried out by Group 1A and the issues raised.

System features

The system evaluated was a front-end which used three different search engines and allowed for query expansion based on relevance feedback on the combined set of results.

Evaluation goal

The evaluation goal was to assess user reponse to this new facility and improve its presentation if necessary. Although different ranking algorithms had been implemented for presenting extracted terms from the retrieved set, the evaluation was not directly concerned with the functionality or retrieval effectiveness.

The aim was to conduct an exploratory qualitative exercise focused on the searching behaviour of the user.

Experimental design

The design incorporated the following elements:

User characteristics

Subjects were experienced Web searchers with IR knowledge.

Search task

Subjects were asked to search in pairs on a topic of interest to them rather than being assigned a search topic.

Data collection instruments

Data was obtained through a combination of instruments including: direct observation, talk aloud protocol between the subjects and a post-search questionnaire.

Data recorded

The data collected focused on observable behaviour, user attitudes and their understanding of the system. As far as possible actions, moves, and decisions were noted, e.g. items and terms selected, rankings and iterations. It was envisaged that in an actual evaluation, transaction logs could provide quantitative data on searching behaviour which would allow the experimenter to concentrate on the qualitative data through direct observation.
The searchers' own comments during and after the search session also provided qualitative explanations for actual behaviour.

The questionnaire included question on:

Findings

Interface features

A number of buttons and labels were not self-explanatory. In particular options were made available at inappropriate times and led to illegal actions.More attention needed to be paid to the user dialogue and search interaction to minimize confusion. Once actions were selected, the lack of feedback caused some frustration. The response time in a networked environment cannot easily be controlled but needs to be taken into account.

Searching task

The searching task was divided into two seemingly separate activities: query construction and document viewing. This was largely due to the discreet windows related to each activities. However the lack of integration between the two tasks and the different environments was problematic. Once searchers were viewing documents in the networked environment the tendency was to try to follow-up links and not return to the local window to reformulate the query. To some extent the system architecture appeared to accentuate the tension between querying and browsing in the searching process.

Understanding the functionality

Although it could be assummed that all the subjects had a good understanding of principles of query expansion and relevance feedback and could be considered as experts, their understanding of the system's functionality did not necessarily assist them in practice. In some cases it was evident that the searcher had difficulty in relating relevance judgements to the list of candidate terms displayed for query expansion. In particular some were surprised that original query terms did not appear.

Conclusions

Even an almost ad-hoc evaluation such as this produced some useful results particularly for improving basic interface features. However addressing the issues concerning the integration of the querying and browsing tasks is more complex. Moreover the selection of appropriate terms for interactive query expansion is conceptually very demanding.

Group Members

Micheline Beaulieu , Catherine Berrut , Norbert Goevert , Josiane Mothe Sedes , Brigitte Simonnot , Malika Smail , Alan Smeaton .