This is a proposal for a working group to lay the foundations for new frameworks to evaluate the emerging interactive and multimedia information retrieval applications. The starting point for this group is the traditional evaluation methodology developed during the last forty years largely within the IR community. Many members of the group have been responsible for the creation of this methodology. We have come to the conclusion that this methodology is not adequate for sustaining the development of large scale applications within highly interactive and multimedia user-centred environments. The current situation with respect to research and development is different from before in two essential ways,
* IR techniques are beginning to be used in complex goal and task oriented systems whose main objectives are not just the retrieval of information.
* New original research in IR is being blocked or hampered by the lack of a broader framework for evaluation. Our response to the above situation has been to bring together a number of researchers who may be able to create the new evaluation methodology to support and sustain the emerging research and applications in this area. Our objectives for the group are (they are elaborated in this order in the sequel):
* Bring the user back into the evaluation process.
* Understand the changing nature of IR tasks and their evaluation.
* 'Evaluate' traditional evaluation methodologies.
* Consider how evaluation can be prescriptive of IR design
* Move towards balanced approach (system versus user)
* Understand how interaction affects evaluation.
* Support the move from static to dynamic evaluation.
* Understand how new media affects evaluation.
* Make evaluation methods more practical for smaller groups.
* Spawn new projects to develop new evaluation frameworks
Our approach to the creation of new frameworks will be to continue to support and to encourage the research in the evaluation area at the different sites. We will encourage the creation of projects that will involve real users and actual tasks in the evaluation process At the same time the laboratory style evaluation will continue, especially to specify the design of test data to be used in large scale experiments such as TREC. At all times the balance between user-, task-, and system-orientation will be taken into account. Through some of our sites we will have access to real users who can be involved in evaluation cycles.
It is apparent (at least to us) that the state-of-knowledge of the evaluation of interactive multimedia IR systems is very weak indeed. Our approach will be to report to each other and subsequently to the wider community about our new evaluation frameworks, mostly through the publication of papers and reports, and occasionally through the demonstration of working evaluation procedures. Moreover, we will attempt to interact formally with some selected businesses to investigate and help with their evaluation problems (e.g. Publishing, Banking, Communications).
The results of our work will be exploited by increased interaction between our group and HCI, CBR, DBMS and AI, members from those disciplines will be invited to participate in our workshops. These disciplines are all beginning to use IR techniques but are hampered by the lack of appropriate evaluation methodology. Moreover, we will increase our interaction with industry involved with multimedia IR applications. Many of us already have strong links with industry in this area. Finally, the working group will be used to propose new RTD projects which will increase the chances of exploitation.