In this paper we have presented a logical model for information
retrieval based on a probabilistic terminological logic. In this
model, IR is seen as the task of 1) computing, for a given information
need (represented by the concept) and for each document
(represented by an individual constant)
, the real number
(represented by the constant)
such that
is valid in
(i.e. in the theory representing the document base and the
lexical, ``thesaural'' knowledge), and 2) ranking documents in terms
of their associated
.
Besides enjoying the numerous properties that accrue from the adoption of a TL (properties that are more fully described in [8]), this model takes advantage of the considerable expressive power provided by our probabilistic extension to the terminological framework. This extension allows the distinct expression of two radically and conceptually different kinds of probabilistic information that feature in the IR task, i.e. statistical information, and information about the degrees of belief that the IR system being modelled has in other information.
Although statistical information and information about degrees of
belief are conceptually different, it is clear that there is a
relationship between the two. Our work so far has aimed at providing
a framework in which both could be expressed and reasoned upon in a
principled, semantically clear way. A further step in this direction
should be the investigation of mechanisms for allowing information
about degrees of belief to be directly derivable from
statistical information. For instance, ifthe system has no belief at
all (i.e. to no degree) whether a given assertion is true, but
at the same time knows that 80%of all individuals of the domain are
's, it might plausibly decide to believe with a 0.8 degree of
confidence that
, a particular individual in the domain, is a
.
This approach to the derivation of degrees of belief, well known in
actuarial reasoning, is known as direct inference (see
e.g. [7]). Other approaches exist however, yielding
different results, and based on principles as diverse as the maximum entropy principle (see e.g. [3]), the centre of mass principle or the maximal independence
principle (see e.g. [2]). Unfortunately, in all of
these approaches, degrees of belief are completely determined
by statistical information, to the extent that two formulae such as
and
would jointly imply
that
; instead, it is clear that we would like to be able to
entertain such beliefs without this implying that
.
Investigating mechanisms that allow statistical information to
determine degrees of belief only when these latter are not
already determined is the next research task that this work opens
up.