In fact, the Logical. Model does not go as far as specifying which logic has to be chosen for modelling. IR. As a consequence, the key problem of this research ...
A Probabilistic
Terminological Logic for Information Retrieval*
Modelling
Fabrizio Istituto
Sebastianil
di Elaborazione
Consiglio Via
dell’Informazione
Nazionale
S. Maria,
E-mail:
delle
46-56126
Ricerche Piss
f abrizio~iei
.pi.
(Italy) cnr.
it
Abstract Some
researchers
described
by
information the
have
means need
should
representation
conditional Logics fact
degree
that
of certainty.
and
1 In
work,
discusses,
by means
In IR
this
information
about
in a logical can
Two
d+
proposal, account
be assessed
up
to a limited
a model
tool
for
of IR based
representing
probability information,
for in this
semantics
as a formal
is the
not
degrees of belie~ can be accounted
its adequacy
This
of probabilistic
(possible-worlds)
“-+”
does
terms
types
d is
n, where
Terminological
above.
by introducing
different
be
to a given
and
proposed way,
only
successfully
need
described
of real-valued
and a denotational
of examples,
we have
limitation
the expression of a TL.
paper
need
may
document
formula
information
IR
to an information to overcome
logical
of the
paradigm
modelling
(IR)
of a given
of the
a recent the
Retrieval
relevance
validity
within
expressions
of a number
the
the
adequately
allowing
syntax
of Information
representation
question.
we try
and
a formal
task
checking
modelling
towards
i.e. a logic
information
presents
in
for
involving
the
accordingly,
n is the
of a document
In this
The paper
by
logic
a step
TL,
possibly
i.e. statistical
the logics
relevance
on a Probabilistic
values
of
making the
that logic;
document,
as suitable
while
argued
be assessed
of the
connective
(TLs)
however, the
recently
of mathematical
for this
for describing
logic.
logic,
and
IR.
Introduction recent
researchers in Information Retrieval (IR) have devoted an increasing of IR, i.e. for theoretical descriptions of the IR process that
years,
to the search for models as specifications relative The
for
building
running
efficiency
of systems
built
attention
of researchers
systems,
along seems
their lately
and
tools for
as theoretical
abstractly
amount could
of work
serve
both
investigating
the
guidelines. to have
concentrated
on the
so-called
Logical
Model,
first
introduced by van Rljsbergen [11]. According to the Logical Model, IR may be seen as the task of retrieving, in response to an information need on the part of the user, all the documents that belong to to the notion of validity of the a given document base and that make the formula d + n valid (according chosen the
logic
,C), where
language
of L,
d and
and
“+”
n are the is the
representations
“conditional”
of the
connective
document
and
of the
information
need
in
of L.
In fact, the Logical Model does not go as far as specifying which logic has to be chosen for modelling IR. As a consequence, the key problem of this research paradigm is the selection of an adequate logic for this task; a number of proposals have thus recently appeared that, with varying degrees of success, attempt to instantiate the Logical Model by means of an appropriate logic. In a recent paper [9], we have argued that a family of logics suitable (at least, at a first approximation) for modelling relevance of documents to information needs along the guidelines of the Logical Model is Logics (TLs); we have gone further to propose one such logic (which we have that of Terminological dubbed
that
MIRTL)
However, model does of a document the
system
the
the
we deemed
particularly
TLs do not deal with uncertain not make provisions for the fact to an information cannot
probability
reasonably that
the
need expect
system
with
suited
certainty.
to determine attributes
to IR
purposes.
and information that the system
to
statistical information. is not normally able
Actually, relevance
d being
van
Rijsbergen
“objectively”, relevant
to
n,
stresses
we should In
the
*This work has been carried out in the framework of project FERMI 8134“Formalization Retrieval of Multimedia Information”, funded by the European Community under the ESPRIT t Current ~dre~~: Department of Computing science, University of Glasgow, G 12 8QQ ~mai]:
fabrizioC!dcs
.glasgou.
ac. uk
Because of this, our to assess the relevance
logical
that, think
given
that
in terms
model,
IR
of
then
and Experimentation in the Basic Research scheme. Glasgow, United Kingdom.
123 becomes the task of computing, and
ranking
documents
In this ferent
paper,
types
two
we attempt
types
system
piece
interesting
probabilities to model
the
probability.
i.e. it will
allow
of this
the
interesting relying
that
conditional
logic
The
paper
that,
for
IR
purposes,
information
and
how
both
and
a denotational
2
The
and
Logics
we will
call
(PTLs);
we will Section
introduction
IR
modelling
1. provide syntax,
of our
tifaceted properties
descriptions
tifaceted
nature
the
store
the need
while
van
terms
and
primary
it reasonably
TLs
within
with
the
the
or
for
we do this
We will
expressing
In
for
by specifying
or
roles
detail syntax
terms
representing
logics
Probabilistic
i.e. a probabilistic
and
version
of MIRTL,
language
fact
that
graphical
their
use
in
Logics:
to accommodate,
in
is rich
documents
an intuitive
enough have
characteristics,
language,
needs
complex
etc.)
and with enough
language,
information,
need
“object-oriented”
to
account
a number that
the same
of
users
for
the
mul-
“orthogonal”
might
intuitive
to address
and with
i.e. the kind
clean,
in terms
of TLs
binary
are
relation
symbols
and
of the
the
the same
want
to use
“object-oriented”
above
mentioned
intuitive
of information
natural
“d+
terms.
In TLs
on
domain
the
unary
model
represent
a particular symbols and
are
are formed formed
predicate
(indicated of IR
by metavariables
logics
of TLs
way
that
of thinking
n view”
put
a term
forth
mul-
“object-oriented” IR
systems
of the by van
is an expression
of discourse.
by
metavariables
D1, 2.
in
[9],
a concept.
the
a document
contained t 1, the
individual
binary
constants
Terms
in the
D2,
appears-in
1TL~ may in f=t
be regarded
(sing
document
of first-order
Each
and
logic.
the base
symbols IEI-CNR,
the
application
. . .).
example,
SIGIR93))
as fragments
In
application
is represented
For
predicate SIGIR93
. . .).
usually
TL
by
that
under the
denotes terms
manner
Ml,
has
its
an individual constant
consideration. appears-in,
expression
denoting binary
as complex operators
M2, own
either
to sentential
term-forming
M,
of a
individuals
denoting
of connective
individual
author,
relevance
Rljsbergen.
denoting
same
of
by metavariables
paper (func
R2,
recursive
D,
by
and
the
in Footnote
developed document
RI,
(indicated
is represented paper
R,
by the recursive
by
symbols
are detailed
in MIRTL
of documents
deals-with,
(indicated
sentential
terms
predicate
(and
statistical
formal
a formal
the resulting
Logics
This
the
semantically
expressions
constants,
the
3 we argue
both
in full
of real-valued call
of them,
Terminological
representation
of “lexical”
a unary
of classical
one included
class
than
2 we give
In Section
4 we go on to specify
expression
of a TL.
in Section
in [9].
primitives
in TLs;
allows
representation
in the same
syntactic
complex
predicate
the confines
less controversial
self-contained,
introduced
In Section
enough
structure,
an interesting,
are called
individual
Rljsbergen,
of a conditional
are called individual constants (hereafter indicated by met avariables i, il, iz, . . .), while unary relations are called concepts (indicated by met avariables C, Cl, C2, . . .) and terms
letters,
the
queries;
to an informa