A Probabilistic Terminological Logic for Modelling Information ... - Cnr

0 downloads 0 Views 921KB Size Report
In fact, the Logical. Model does not go as far as specifying which logic has to be chosen for modelling. IR. As a consequence, the key problem of this research ...
A Probabilistic

Terminological Logic for Information Retrieval*

Modelling

Fabrizio Istituto

Sebastianil

di Elaborazione

Consiglio Via

dell’Informazione

Nazionale

S. Maria,

E-mail:

delle

46-56126

Ricerche Piss

f abrizio~iei

.pi.

(Italy) cnr.

it

Abstract Some

researchers

described

by

information the

have

means need

should

representation

conditional Logics fact

degree

that

of certainty.

and

1 In

work,

discusses,

by means

In IR

this

information

about

in a logical can

Two

d+

proposal, account

be assessed

up

to a limited

a model

tool

for

of IR based

representing

probability information,

for in this

semantics

as a formal

is the

not

degrees of belie~ can be accounted

its adequacy

This

of probabilistic

(possible-worlds)

“-+”

does

terms

types

d is

n, where

Terminological

above.

by introducing

different

be

to a given

and

proposed way,

only

successfully

need

described

of real-valued

and a denotational

of examples,

we have

limitation

the expression of a TL.

paper

need

may

document

formula

information

IR

to an information to overcome

logical

of the

paradigm

modelling

(IR)

of a given

of the

a recent the

Retrieval

relevance

validity

within

expressions

of a number

the

the

adequately

allowing

syntax

of Information

representation

question.

we try

and

a formal

task

checking

modelling

towards

i.e. a logic

information

presents

in

for

involving

the

accordingly,

n is the

of a document

In this

The paper

by

logic

a step

TL,

possibly

i.e. statistical

the logics

relevance

on a Probabilistic

values

of

making the

that logic;

document,

as suitable

while

argued

be assessed

of the

connective

(TLs)

however, the

recently

of mathematical

for this

for describing

logic.

logic,

and

IR.

Introduction recent

researchers in Information Retrieval (IR) have devoted an increasing of IR, i.e. for theoretical descriptions of the IR process that

years,

to the search for models as specifications relative The

for

building

running

efficiency

of systems

built

attention

of researchers

systems,

along seems

their lately

and

tools for

as theoretical

abstractly

amount could

of work

serve

both

investigating

the

guidelines. to have

concentrated

on the

so-called

Logical

Model,

first

introduced by van Rljsbergen [11]. According to the Logical Model, IR may be seen as the task of retrieving, in response to an information need on the part of the user, all the documents that belong to to the notion of validity of the a given document base and that make the formula d + n valid (according chosen the

logic

,C), where

language

of L,

d and

and

“+”

n are the is the

representations

“conditional”

of the

connective

document

and

of the

information

need

in

of L.

In fact, the Logical Model does not go as far as specifying which logic has to be chosen for modelling IR. As a consequence, the key problem of this research paradigm is the selection of an adequate logic for this task; a number of proposals have thus recently appeared that, with varying degrees of success, attempt to instantiate the Logical Model by means of an appropriate logic. In a recent paper [9], we have argued that a family of logics suitable (at least, at a first approximation) for modelling relevance of documents to information needs along the guidelines of the Logical Model is Logics (TLs); we have gone further to propose one such logic (which we have that of Terminological dubbed

that

MIRTL)

However, model does of a document the

system

the

the

we deemed

particularly

TLs do not deal with uncertain not make provisions for the fact to an information cannot

probability

reasonably that

the

need expect

system

with

suited

certainty.

to determine attributes

to IR

purposes.

and information that the system

to

statistical information. is not normally able

Actually, relevance

d being

van

Rijsbergen

“objectively”, relevant

to

n,

stresses

we should In

the

*This work has been carried out in the framework of project FERMI 8134“Formalization Retrieval of Multimedia Information”, funded by the European Community under the ESPRIT t Current ~dre~~: Department of Computing science, University of Glasgow, G 12 8QQ ~mai]:

fabrizioC!dcs

.glasgou.

ac. uk

Because of this, our to assess the relevance

logical

that, think

given

that

in terms

model,

IR

of

then

and Experimentation in the Basic Research scheme. Glasgow, United Kingdom.

123 becomes the task of computing, and

ranking

documents

In this ferent

paper,

types

two

we attempt

types

system

piece

interesting

probabilities to model

the

probability.

i.e. it will

allow

of this

the

interesting relying

that

conditional

logic

The

paper

that,

for

IR

purposes,

information

and

how

both

and

a denotational

2

The

and

Logics

we will

call

(PTLs);

we will Section

introduction

IR

modelling

1. provide syntax,

of our

tifaceted properties

descriptions

tifaceted

nature

the

store

the need

while

van

terms

and

primary

it reasonably

TLs

within

with

the

the

or

for

we do this

We will

expressing

In

for

by specifying

or

roles

detail syntax

terms

representing

logics

Probabilistic

i.e. a probabilistic

and

version

of MIRTL,

language

fact

that

graphical

their

use

in

Logics:

to accommodate,

in

is rich

documents

an intuitive

enough have

characteristics,

language,

needs

complex

etc.)

and with enough

language,

information,

need

“object-oriented”

to

account

a number that

the same

of

users

for

the

mul-

“orthogonal”

might

intuitive

to address

and with

i.e. the kind

clean,

in terms

of TLs

binary

are

relation

symbols

and

of the

the

the same

want

to use

“object-oriented”

above

mentioned

intuitive

of information

natural

“d+

terms.

In TLs

on

domain

the

unary

model

represent

a particular symbols and

are

are formed formed

predicate

(indicated of IR

by metavariables

logics

of TLs

way

that

of thinking

n view”

put

a term

forth

mul-

“object-oriented” IR

systems

of the by van

is an expression

of discourse.

by

metavariables

D1, 2.

in

[9],

a concept.

the

a document

contained t 1, the

individual

binary

constants

Terms

in the

D2,

appears-in

1TL~ may in f=t

be regarded

(sing

document

of first-order

Each

and

logic.

the base

symbols IEI-CNR,

the

application

. . .).

example,

SIGIR93))

as fragments

In

application

is represented

For

predicate SIGIR93

. . .).

usually

TL

by

that

under the

denotes terms

manner

Ml,

has

its

an individual constant

consideration. appears-in,

expression

denoting binary

as complex operators

M2, own

either

to sentential

term-forming

M,

of a

individuals

denoting

of connective

individual

author,

relevance

Rljsbergen.

denoting

same

of

by metavariables

paper (func

R2,

recursive

D,

by

and

the

in Footnote

developed document

RI,

(indicated

is represented paper

R,

by the recursive

by

symbols

are detailed

in MIRTL

of documents

deals-with,

(indicated

sentential

terms

predicate

(and

statistical

formal

a formal

the resulting

Logics

This

the

semantically

expressions

constants,

the

3 we argue

both

in full

of real-valued call

of them,

Terminological

representation

of “lexical”

a unary

of classical

one included

class

than

2 we give

In Section

4 we go on to specify

expression

of a TL.

in Section

in [9].

primitives

in TLs;

allows

representation

in the same

syntactic

complex

predicate

the confines

less controversial

self-contained,

introduced

In Section

enough

structure,

an interesting,

are called

individual

Rljsbergen,

of a conditional

are called individual constants (hereafter indicated by met avariables i, il, iz, . . .), while unary relations are called concepts (indicated by met avariables C, Cl, C2, . . .) and terms

letters,

the

queries;

to an informa

Suggest Documents