Next: 2.2.4 Role of the Up: 2.2 Description of the Previous: 2.2.2 Baseline and rationale

2.2.3 Programme of work

2.2.3.1 Research Lines

The development of a theory of Multimedia Information Retrieval will be conducted along four main research lines:

All these lines contribute to the sought theory and strongly interact with each other, so that a task of the project will in general address more than one of them. However, they reflect the basic classes of problems or domains of investigation that will be dealt with by the FERMI consortium, thus they are introduced in order to describe from a convenient perspective the research activity of the project.

In order to demonstrate the integration of these lines and the achievements of this approach, an experimental evaluation will be performed.

2.2.3.2 The Logic Research Line

The goal of this research line is to develop a logic for representing and reasoning on the structure and the content of multimedia documents in a way that captures the notion of relevance of documents to users' requests. Contribution to this logic is expected from different logics, each addressing a single aspect of the problem. In the following, we will briefly review these logics, pointing out which aspect each addresses and how it can fit into the global picture.

Terminological Logic

The reasoning on the structure of multimedia documents is a classical form of reasoning of the kind modelled by first-order logic. By representing the documents and their components as individual constants, and the structural relationships between these as binary predicate symbols, the decisive step has been made in the direction of employing a first-order predicate calculus to retrieve documents according to their structure.

It should not be hard to see that the expressive power of first-order languages is sufficient to model all the relevant aspects of the structure of multimedia documents, notably the multifaceted and multileveled nature of this structure [MRT91]. In addition, the formulation of standards of representations (such as ODA) in a first-order syntax is not only possible, but probably the most principled way of integrating these standards into a more powerful and general representational framework. More importantly, by including a first-order reasoning ability in the MIR logic, the way is open to an important acquisition to the logic inferential power: that of domain knowledge. Domain knowledge is maintained and used in traditional information retrieval systems under a variety of different forms, including information structures (such as dictionaries and thesauri) and user profiles. All these forms can be represented via the machinery of first-order syntax, and thereby involved in the inferential process in the proper way.

Naturally, the question is that first-order logic has notoriously bad computational properties, which prevent its use in any system concerned with efficiency. Recent work in the field of Knowledge Representation has brought to the general attention a class of logics, named Terminological Logics (TL), which are subsets of first-order predicate calculus and show two interesting properties: first, a modelling power adequate to the representation of object structures; second, a decidable, in many cases tractable, decision problem [Neb90].

A preliminary study on the suitability of TL for the modelling of information retrieval [MSST93] shows that these logics are indeed adequate for this role. Their expressive power is sufficient to model very sophisticate document structures and a significant variety of domain knowledge. More importantly, their Tarski-style denotational semantics makes them extendable in the probabilistic dimension, which is fundamental for information retrieval. Further research is needed on the use of TL in IR, concerning:

Other research activities on TL will concern the probabilistic and computational lines, and are described below.

Modal Logic

This approach to logic-based retrieval models which stresses the representation and reasoning on the content of a document, is based on the notion of possible world. As in using other formal logics, one tries to formalize the intuitive meaning of deduction () by the matching process between a document and a query.

In modal logic, the logical value of a proposition is relative to a world in which each atomic logical item is associated to a truth value. Thus changing to another possible world affects the logical value of the proposition, and therefore may change the logical value of the implication. For IR models, and especially those based on logic, modality gives a sense to the notion of possible implication : an implication that is not logically verified in a world may become true in an other world. Tansitions from one world to an other and the uncertainty function that may be attached to such transitions, can be used to express the effort, in terms of knowledge about the document content, about the user query, or about domain knowledge, that has to be used for asserting the match. This world transformation gives thus a concrete expression of the kind of knowledge that could be used to evaluate a complex match on complex documents. This aspect is important considering the generality of the model. It is also important in the context of MIR where we will most probably deal with high-level representation of semantic information for complex documents and queries.Using modal logic in IR leads to choose logical expressions and sets of possible worlds as representations for documents, queries and knowledge used in the matching process. Different modal systems may be investigated depending on which of the document or the query is viewed as a world. It should be clear also that if domain knowledge is of prime use for supporting the evaluation of the matching, other kinds of knowledge may be involved as well, such as knowledge about the structural aspects of information, or users' knowledge.

The formalization of the retrieval process and the underlying uncertainty principle induce to consider a fuzzy valuation of atomic logical proposition, and a fuzzy valuation of the relation between the possible worlds. Using fuzzy modal logic, one can explicit various sources of uncertainty such as uncertainty related to domain knowledge or uncertainty related to document indexes that have all their own incidence in the final evaluation of the uncertain implication. The way this final evaluation is computed of course depends also on the way the uncertainty is modeled. Furthermore, we think that the values of all these uncertainties have to be related to a probabilistic point of view. In fact, fuzzy modal logic may be seen as a framework for designing a retrieval model able to cope with notions like structured and meaningful documents of the kind appearing in a multimedia context. By formally breaking the initial logical implication into several ones (related to possible worlds), it enables to encompass in a unique framework the various aspects of the semantics of this implication.

Relevance Logic

An essential feature of the logic for MIR is that the logic's implication relation capture the notion of relevance of documents to queries. This is ultimately what makes a logic a suitable tool for modelling retrieval, and defines a research line that is orthogonal to the two lines introduced so far: the inclusion of a notion of relevance in the inferential apparatus of the MIR logic.

Relevance is in fact the most critical parameter in the evaluation of IR systems; bulding systems that perform the retrieval of relevant information only may thus be considered the most important issue in IR. Unfortunately, the primacy of relevance in the whole IR discipline is also the primary cause that has hindered, up to now, the development of a theory of IR. In fact, relevance is not a formally and clearly defined notion; what relevance is, in other words, is defined by the user from time to time and from experiment to experiment, and is then heavily dependent on judgements where highly subjective and scarcely reproducible factors are brought to bear. The very possibility of a theory of IR is then dependent on the possibility of giving a formal definition of what relevance is, a definition capable of abstracting from the subjective and contingent factors inherent in the operational view of relevance described above.

To the surprise of many orthodox logicians, the idea of an object being relevant to an object has been shown to be amenable to formal treatment by a number of logicians who have defined a class of logical calculi called relevance logics [Dun86][AB75]. As with modal logics, there are many relevance logics; some of them are ordered with respect to expressive power, while some of them are incommensurable with respect to this dimension; more importantly, different relevance logics formalize a different notion of relevance.

We will consider a specific relevance logic that is well suited to formalize a state of affairs in which both document and query have a boolean representation, and in which the relevance of one to the other is the parameter of interest. This is the logic of tautological entailments [Dun76], the fragment of the relevance logics and (called the logic of entailment and the logic of relevant implication, respectively) that deals with first degree entailments only, i.e. pairs of propositional (classical) formulae separated by one ``" symbol. The relevance-based reasoning of will be integrated with other representations of documents (whether addressing the structure or the content of documents) to obtain relevance-based inference mechanisms. This is made possible by the fact that has a four-valued semantics, independently developed by Belnap [Bel77][Bel75] and Dunn [Dun76], which makes it amenable to the necessary extensions for MIR.

Other relevance logics which are compatible with different meaning representation chosen for documents and queries may also be considered.

Other research activities on relevance logic will concern the probabilistic and computational lines, and are described below.

2.2.3.3 The Probabilistic Research Line

The goal of this research line is to develop an appropriate theory of uncertainty which is consistent with the logics we develop in the Logic research line. Such a theory will be based on probability theory and for convenience will be called a Probability Logic ( a special case could be a Modal Logic), although in deriving the final formalisation we will need to loook at the Dempster-Shafer theory of evidence.

The are essentially two ways of proceeding,

  1. derive a logic from the probability theory
  2. extend a logic to accept a probability measure

From Probability to Logic

The underlying motivation for the first approach is that in IR we now have a fairly good idea how we model retrieval probabilistically, from this we should be able to move to deriving a logic that supports probabilistic inference in IR. It is well know that Bayesian Conditionalisation will induce a weak logic: the C2 logic [Sta68]. This is a good example of how a logic may be derived from conditionalisation of probability functions. Conditionalisation (or conditioning for short) is a form a belief revision, where beliefs are assumed to be represented by probability functions. There are other forms of conditioning such as Jeffrey conditioning or Imaging. It is not known what the nature of the induced logic would be under these last two forms of conditioning. We aim in this research line to derive the different logics induced by different forms of conditioning.

From the above it should be clear that fundamental to the research is the process of conditioning. This process will be investigated in some detail and of course will asssume the earlier work on Bayesian Conditioning. A general way of looking at this process is that a probability function is revised in response to evidence; such evidence can arise in two distinct ways:

The probability logic should take account of both forms of evidence and combine them to produce a revised probability of relevance. The system generated evidence would come about through measuring the extent to which the entailment relation in the logic held between a multimedia object and a query. The user generated evidence is general non-propositional and would come about through interaction with a user. Therefore we have that:

In words the probability of relevance of a document to a user is a function of the uncertainty associated with a document entailing a query and the uncertainty associated with the users opinion of the evidence. In [vR92] there is a preliminary formalisation of this.

From Logic to Probability

So far, probabilistic information retrieval models were restricted to fairly simple inference structures (e.g., regarding the inference from the set of terms occurring in a document to the current query). On the other hand, the TL to be developed will allow for rather complex inference processes, where the inference from a document to a query will lead along several intermediate concepts.

In order to combine probabilistic inference with TL, first a simplified version of a TL will be regarded. This simplified TL will be powerful enough to cover most of the data structures found in today's experimental information retrieval systems. This gives already a unique representation formalism, although the expressional capabilities are limited. For this simplified TL, the combination with probabilistic inference algorithms will be investigated. For this purpose, mainly Bayesian inference networks [Pea88] will be considered. For the content of text documents, this combination has been investigated already in [TC91]. So the usage of a simplified TL extends this approach to the other views on documents.

Based on the results for the simplified TL, the combination of the ``real'' TL with probabilistic inference will be investigated. In addition to Bayesian networks, the more powerful DUCK probabilistic logic approach described in [GKT91] will be considered here, too. In this setting, complexity issues will become important. Inference in Bayesian networks is already NP-complete, and the DUCK inference procedure is even more complex [KKG93]. On the other hand, the specific information retrieval application may restrict the structure of the inference process, thus simplifications and speedups of the general probabilistic inference algorithms may be possible.

Probability Logic

The two approaches above are likely to converge. We will investigate the difference between TL and a logic derived from probabilistic conditioning. The result is likely to be a logic which will combine suitably with probability theory and will be called Probability Logic (PL). This final PL will be the subject of prototyping and experimentation.

2.2.3.4 The Computational Research Line

This research line directly concerns the relationship between the theory developed within the two previously examined lines and the efficiency of a MIR system based on this theory. The basic question addressed by the computational research line is whether a given theory of MIR possesses good computational properties, hence it can be used as the formal basis of a MIR system. This is a first evaluation of our theory, based on formal tools such as computability and complexity theory, which is to be understood as a prerequisite for any further consideration of the theory.

Computational analysis

The computational properties of any proposed logic for MIR will be investigated, with the aim of classifying the decision problem of the logic into a computational class, which is something that ranges from the undecidable to complexity classes [Joh90].

The computational properties of most TL used for knowledge representation are known [DLNN91]. However, the modelling of MIR may suggest different sorts of TL, including new costructs that are suitable to this specific subject matter. The complexity of these new logics will then be analysed.

As far as relevenace logics is concerned, the computational properties of have been partially investigated. As it turns out, deciding entailment in the general case is likely to be intractable (namely the problem is co-NP-complete [PS87]), whereas there is a polynomial time algorithm that decides entailment between two CNF formulae [Lev84]. While this provides further evidence to the suitability of as a logic for Information Retrieval, the investigation on the complexity of tautological entailment must clearly be extended to other classes of formulae. Turning to quantified relevance logics, first-order tautological entailment is, unfortunately, powerful enough so that first-order logical implication can be reduced to it, thus sanctioning its undecidability [PS87]. There are, however, neighbor notions of entailment that are decidable, and that have been used in knowledge representation to model tractable forms of reasoning on beliefs [Lak86]. The suitability of these notions to Information Retrieval must be investigated, pursuing further the study of their computational properties.

Probabilistic algorithms

In the past decade, many decision problems apparently simple and computationally harmless have been discovered to belong to a class of suspected intractable problems such as NP. These results have narrowed in a considerable way the class of tractable problems, that is problems that can be efficiently solved. Recently, new probabilistic complexity classes have been proposed to extend the notion of tractable problem, and to seek whether there could be problems intractable in the deterministic sense which admit an efficient probabilistic solution [Gil77][Sch84]. By probabilistic solution we mean that the solution is given within a certain time bound with very high probability and we allow the computation time to exceed this bound with negligible probability. In other words, the running time becomes a random variable. Alternatively, probability comes into play as a source of probabilistic, though negligible, error in the computed result. The most important probabilistic class is called BPP and its importance stems from its conjectured incomparability with NP.

Probabilistic complexity classes may turn out to be very important for information retrieval, because it might be the case that they include the decision problem for logics that are suspected to be intractable but are suitable to MIR. These logics could then be adopted for modelling MIR, because the uncertainty of their probabilistic inference algorithms would be a negligible addition to the uncertainty typical in the information retrieval process.

There are already a good number of candidates to this kind of probabilistic investigation among TL. Notice that the identification of a TL decision problem in a probabilistic complexity class would be an important result, whose relevance would go far beyond the information retrieval field.

Parallel algorithms

Tractable decision problems may be solved by parallel algorithms, which would significantly improve the performance of the system. The retrieval of information on multimedia documents seems a well suited task to parallel treatment, as typically documents are considered one at a time for matching against a query. Also in the process of knowledge acquisition it seems that parallelism can be exploited, as the indexing of a document largely proceeds independently from that of the other documents.

2.2.3.5 The Multimedia Research Line

Main research lines

The basic problem addressed here is to provide a model for expressing the semantic content of multimedia data. We will limitate these investigations to three aspects which we consider of prime importance given the main topic of this proposal (namely the retrieval of multimedia data):

We plan to design a model that could encompass these various aspects of the semantic content of multimedia data, and that could be fully integrated in a retrieval model to ensure complete use of this explicit knowledge in the retrieval process. This means in turn that the particular logic that will be developed in the project will have to encompass all these basic aspects of multimedia data.

The specificities of each media

We think that identifying and modeling the particularities of each media about this notion of semantic content are problems that need to be investigated from the specific point of view of information retrieval. There are obviously specific modalities for each media: images, for example, are bidimensionnal (or tridimensional) signals. This allows the existence of image-specific features related to this dimensional modality. Shape, volume, mutual position of objects, are such features that linear information such as text cannot offer. These features bring their own semantics or, said in other words, have their specific connotations. Moreover, non-textual media have numerous modalities and their combination also has its own semantics. Because the corresponding number of combinations is great non-textual data is usually more ambiguous (ie. has more possible semantic interpretations). Information retrieval requirements related to this problem refer to the way users express their information needs or, said in other words, to the most useful features that are used for accessing a given media. A special attention will be given to the modeling of this information.

The semantics of structures

Talking about multimedia data often implicitely refers to ``association'' of various media, and this in turn corresponds to structures that combines information of different types of media. At its most elementary level, this notion corresponds to the concept of link, instance of a particular association between two entities. Within a given database, there is no example of such a link that has not a given semantics: it is defined at least by an attribute name. At a higher level this corresponds to the notion of complex object, or highly structured object, which inherits its semantics from the predefined type of which this object is an instance. If we consider now the semantic content of two linked objects, one may on one hand infer from that data a semantic of this link (for example we may discover that an image is an illustration of a particular topic of a text). Conversely, knowing the semantics of a text and the semantics of a link, one can infer some data about the semantic content of the information related to this text. On the other hand it seems clear that direct indexing of non-textual data will provide only partial information about their semantic content. A consequence of this situation is that we should take full advantage of infered information from related documents whenever they are available. These are the multiple aspects that we want to investigate about the structural aspects of multimedia information.

The notion of uncertain meaning

Factual knowledge that may be extracted from any media by indexing processes is very often uncertain. This means that the assignation of a given meaning to a given feature is often a difficult problem. This is well known even for textual information where words, or even expressions, may have several meanings that we are not always able to disambiguate in a given sentence or text. The problem becomes more challenging when considering other media where the signal itself may have not been recognised with absolute certainty, and when any feature may have lots of possible interpretations. This is also a well known problem for example in image pattern recognition. Hence multimedia data modeling will have to primarily incorporate this notion of uncertain feature, and its impact on uncertain meaning of a given feature. This problem also applies to uncertain semantics of links, and has also an incidence on the whole evaluation of the uncertain implication between documents and queries. We plan to investigate this important aspect which is closely related to the very nature of multimedia data.

Approaches

From our experience in the ELEN project we think that the Conceptual Graphs of Sowa [Sow84] are a very promizing approach for solving the above problems. They offer an extremely powerful and flexible formalism for representing and processing complex factual knowledge involving concepts and semantic relations which are needed for high-level representation of semantic content of documents [SW86]. Finally, their formal properties and basic operators make them intrinsically well adapted to logic-based retrieval models [Che92]. The fuzzy modal logic which we have developed in the RIME and ELEN projects will also give us a very good starting point to the modeling of uncertain meaning and to evaluation of uncertain implications.

2.2.3.6 Approaches to IR Experimentation

In a first step, the evaluation criteria used in subsequent experimentation have to be developed theoretically. Here the preference relation approach will be used as basis. For the layout and structure views on documents, it has to be assessed in what way preference relations are affected by elements of these views. In order to develop measures for interrelated documents, preference relations on two different levels will be considered, namely on the user interface level and the retrieval component level. By regarding the interdependence of the preference relations on these two levels, certain preference relations of the retrieval component level will be identified which directly affect preference relations of the user interface level. The former are candidates for defining meaningful effectiveness measures.

Based on these evaluation criteriae, the experiments have to be designed in the next step. Here the type of documents and queries to be used have to be chosen. It is intended to construct the query formulations manually, starting from user need statements. This procedure seems to reflect the typical application more adequately, where the query formulation is constructed interactively in a sequence of interaction steps. Only a limited effort will be spent on the modelling of the documents in MIR logic, thus producing a shallow representation. However, this seems to be sufficient for a first evaluation. It is clear that results will improve with a more detailed modelling. As a baseline representing standard (experimental) retrieval methods, the SMART system [Sal72] will be used.

As experimental system, the prototypes developed by the different groups will be loosely integrated, thus forming a prototypical MIR system.

The data needed for the experiments comprises documents, queries and relevance judgements. Mainly two sources of data will be used, namely the TREC and the RIME collection. The TREC data is an assorted collection of newspapers (including the Wall Street Journal), newswires, journals, technical abstracts and email newsgroups. The documents are formatted using an SGML-like syntax. The queries are in the form of a highly-formatted user need statement. Currently, the TREC data set comprises 3 GB of text and 150 queries with relevance judgements. The RIME corpus has been set up by the LGI-IMAG laboratory. It currently contains medical information that encompasses 600 medical reports and 1500 related images (2D MNR scanner images) provided by the radiology service of the University Hospital of Grenoble. This corpus has been built for experimenting multimedia information retrieval models developed in the RIME project. In this framework, the LGI-IMAG laboratory has developed an object-oriented database environment (based on O2 system) which manages multimedia information (text and images).

After performing the experiments with this data, the results will be analysed by comparing the performance of the MIR logic with that of standard retrieval methods. Besides comparing the values of the effectiveness measures, also a qualitative analysis will indicate weaknesses and strengths of the approach chosen.

2.2.3.7 Workplan

The research work of the project will be structured in four work parts as follows.

WP1
Design of a logic for multimedia information retrieval

This work part will address the problem of designing a probabilistic logic for modelling the structure and the content of documents and for performing relevance-oriented reasoning on these.

WP2
Development of a theory of uncertainty

This work part will develop an appropriate theory of uncertainty, based on probability theory, which is consistent with the logics we develop in WP1.

WP3
Modelling the content of multimedia data

This work task is aimed to the design of a model of the semantic content of multimedia information. This model will encompass both unstructured and structured information that may consist of, or combine, images, texts and graphics.

WP4
Experimentation

This work part deals with the development of an appropriate evaluation methodology and the corresponding evaluation of the prototypical MIR system.

2.2.3.8 PERT

2.2.3.9 Major Deliverables

D1
A non-probabilistic terminological logic for multimedia information retrieval (at month 18).

D2
A model of the semantic content of multimedia data combining structural and factual knowledge for images, graphics, and text (at month 18).

D3
A theory of uncertainty for multimedia information retrieval and its induced logic (at month 24).

D4
The design of experiments in multimedia information retrieval (at month 24).

D5
A probabilistic logic for multimedia information retrieval (at month 30).

D6
A prototype of a multimedia information retrieval system given by loosely-coupled prototypes of its basic components (at month 30).

D7
An experimental evaluation of the proposed theory of multimedia information retrieval (at month 36).

Interdependencies

T12 will use the result of T35 in order to include the model of multimedia data developed in T35 into the representation of the content of multimedia documents developed by T12.

T13 depends upon T21, where a theory of user-dependent relevance is developed. Analogously, T14 depend on the result of T22, which will provide the probabilistic basis of the MIR logic.

The development of prototypes, to be done in T16 and T23, will of course follow the corresponding theoretical work and is placed in one time period in order to allow the synchronization of the experimental phase on textual (T44) and image (T36) data.



Next: 2.2.4 Role of the Up: 2.2 Description of the Previous: 2.2.2 Baseline and rationale